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Descrlpti n 

Th present invention relates to isolated human serine protease (PSP1) polynucleotides, their homologs and iso- 
forms and polymorphic variants and their detection; to essentially pure PSP1 proteins; and to compositions and meth- 
ods of producing and using PSP1 polynucleotides and proteins. 

Mutations in the presenilis (PS-1 and PS-2) account for -95% (75% and 20%, respectively) of all cases of early 
onset familial Alzheimer's disease (FAD). See R. Sherrington et a/., Nature 375, 754-760 (1995); E.l. Rogaev era/.. 
Nature 376, 775-778 (1995); and E. Levy-Lahad et at, Science 269, 973-977 (1995). The presenilins are highly ho- 
mologous (67% identical), multi-membrane spanning proteins whose function is unknown. 

It has been demonstrated that the 46 kDa full-length PS-1 protein is normally processed to 28 kDa and 18 kDa 
fragments; PS-2 has been reported to be similarly cleaved. See M. Mercken era/., FEBS Letters 389, 297-303 (1 996). 
The predicted cleavage site(s) to account for fragments of this size would be in a region of the protein coded for by 
exon 8 and exon 9. Exon 8 is a hot spot for mutations leading to FAD. Thus, this region of PS-1, and potentially the 
cleavage of PS-1 in this region by a presenilinase protease, are important events in the functionality of the protein. A 
region of PS-1 spanning exons B-11 has been demonstrated in the present invention to specifically bind a protease, 
PSP1 , whose activity against its endogenous substrates and/or ability to bind to PS-1 are important in the pathology 
of neurodegeneration associated with AD, frontal lobe dementia, cortical lewy body disease, dementia of parkinson's 
disease, acute and chronic phases of degeneration following stroke or head injury, neuronal degeneration found in 
motor neurone disease, AIDS dementia and chronic epileps. Thus, a need exists for provision of the nucleotide and 
amino acid sequences corresponding to PSP1, for modulators of PSP1 binding to PS-1 , and/or modulators of PSPl 's 
proteolytic activity, for methods to identify such modulators and for reagents useful in such methods. 

Accordingly, one aspect of the present invention is an isolated polynucleotide encoding a biologically active PSP1 
polypeptide. 

Another aspect of the invention is an isolated polynucleotide selected from the group consisting of: 

(a) a polynucleotide encoding PSP1-1" having the nucleotide sequence as set forth in SEQ ID NO: 24 from nucle- 
otide 603 to 1 979; and 

(b) a polynucleotide substantially similar to SEQ ID NO: 24. 

Another aspect of the invention is an isolated polynucleotide selected from the group consisting of: 

(a) a polynucleotide encoding PSP1-2 having the nucleotide sequence as set forth in SEQ ID NO: 23 from nucle- 
otide 603 to 1 979; and 

(b) a polynucleotide substantially similar to SEQ ID NO: 23. 

Another aspect of the invention is an isolated polynucleotide selected from the group consisting of: 

(a) a polynucleotide encoding PSP1-3 having the nucleotide sequence as set forth in SEQ ID NO: 26 from nucle- 
otide 603 to 1 736; and 

(b) a polynucleotide substantially similar to SEQ ID NO: 26. 

Another aspect of the invention is an isolated polynucleotide selected from the group consisting of: 

(a) a polynucleotide encoding PSP1-4 having the nucleotide sequence as set forth in SEQ ID NO: 28 from nucle- 
otide 603 to 1913; and 

(b) a polynucleotide substantially similar to SEQ ID NO: 28. , 

In a further aspect the invention provides any isolated polynucleotide as above defined wherein nucleotides 672 
and 1435 are independently selected from C and T, hereinafter referred to as 'polymorphic variants'. 

Another aspect of the invention is the functional polypeptides encoded by the polynucleotides of the invention. 

Another aspect of the invention is an antisense oligonucleotide comprising a sequence which is capable of binding 
to the polynucleotides of the invention or D87258. 

Another aspect of the invention is modulators of the polypeptides of the invention or of D87258. 

Another aspect of the invention is a method for assaying a medium for the presence of a substance that modulates 
PSP1 or D87258 activity by affecting the binding of PSP1 or D87258 to cellular binding partners comprising the steps of: 

(a) providing a PSP1 or D87258 protein having the amino acid sequence of PSP1 -1 , PSP1-2, PSP1 -3 or PSP1 -4 
or D87258, or a functional derivative or polymorphic variant thereof and a cellular binding partner or synthetic 
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analog thereof; 

(b) incubating with a test substance which is suspected of modulating PSP1 or D87258 activity under conditions 
which permit the formation of a PSP1 or D87258 protein/cellular binding partner complex; 

(c) assaying for the presence of the complex, free PSP1 or D87258 protein or free cellular binding partner; and 

(d) comparing to a control to determine th effect of the substance. 

Another aspect of the invention is a method for assaying a medium for the presence of a substance that modulates 
PSP1 or D87258 activity by inhibiting proteolytic activity on a cellular substrate comprising the steps of: 

(a) providing a PSP1 or D87258 protein having the amino acid sequence of PSP1 -1, PSP1-2, PSP1-3 or PSP1-4 
or DB7258, or a functional fragment or polymorphic variant thereof and a cellular substrate or synthetic analog 
thereof; 

(b) incubating with a test substance which is suspected of inhibiting PSP1 or DS7258 activity under conditions 
which permit the formation of a PSP1 enzyme/substrate complex and subsequent cleavage of the substrate; 

(c) assaying for the presence of proteolyticalfy cleaved substrate; and 

(d) comparing to a control to determine the effect of the substance. 

Another aspect of the invention is a method for assaying for the presence of a substance that modulates PSP1 or 
D872S8 activity by direct binding to PSP1 or D87258 protein comprising the steps of: 

(a) providing a labelled PSP1 or D87258 protein having the amino acid sequence of PSP1-1 , PSP1-2, PSP1-3 or 
PSP1 -4 or D87258 or a functional derivative or polymorphic variant thereof; 

(b) providing solid support-associated modulator candidates; 

(c) incubating a mixture of the labelled PSP1 or D87258 protein with the supportnassociated modulator candidates 
under conditions which can permit the formation of a PSP1 protein/modulator candidate complex; 

(d) separating the solid support from free soluble labelled PSP1 or D87258 protein; 

(e) assaying for the presence of solid support-associated labelled protein; 

(f) isolating the solid support complexed with labelled PSP1 or D87258 protein; and 

(g) identifying the modulator candidate. 

Another aspect of the invention is PSP1 or D87258 protein modulating compounds identified by the methods of 
the invention. 

Another aspect of the invention is a method for the treatment of a patient having need to modulate PSP1 or D87258 
activity comprising administering to the patient a therapeutically effective amount of the modulating compounds of the 
invention. 

Another aspect of the invention is a method of diagnosing conditions associated with PSP1 or D87258 protein 
deficiency which comprises: 

(a) isolating a polynucleotide sample from an individual; 

(b) assaying the polynucleotide sample and a polynucleotide of the invention encoding PSP1 or D87258; and 

(c) comparing differences between the polynucleotide sample and the PSP or D872S8 polynucleotide, wherein 
any differences indicate mutations in the PSP1 or D87258 sequence. 

Another aspect of the invention is a method of treating conditions which are related to insufficient PSP1 or D87258 
protein function which comprises: 

(a) isolating cells from a patient deficient in PSP1 or D87258 protein function; 

(b) altering the cells by transfecting the polynucleotide of the invention or D87258 into the cells wherein a PSP1 
or D87258 protein is expressed; and 

(c) introducing the cells back to the patient to alleviate the condition. 

Another aspect of the invention is a method of treating conditions which are related to insufficient PSP1 or D87258 
protein function which comprises administering the polynucleotide of the invention to a patient deficient in PS PI protein 
function wherein a PSP1 or D87258 protein is expressed and alleviates the condition. 

Another aspect of the invention is an antibody immunoreactive with PSP1 or D87258 or an immunogen thereof. 

Another aspect of the invention is a transgenic non-human animal capable of expressing in any cell thereof the 
polynucleotide of the invention. 

Another aspect of the invention is a method for determining the genetic predisposition to neurode gene rat ion in a 
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patient comprising detecting PSP1 or D87258 polymorphisms in a sample from a patient. Yet another aspect of the 
invention is isolated polynucleotide having the nucleotide sequence as set forth in SEQ ID NO: 32, 33, 34, 35, 36 ,37, 
38, 39, or 40. 

Figure 1 is an amino acid sequence alignment of PSP1-1 with E, colt htrA. 

Figure 2 is a multiple cDNA sequence alignment of the PSP1 isolates PSP1-1, PSP1-2, PSP1-3anti PSP1-4. 

Figure 3 is an amino acid sequence alignment of PSP1-1 with a putative human serine protease. 

As used herein, the term "PSP1 polynucleotide 8 or "PSPr refers to DNA molecules comprising a nucleotide se- 
quence that encodes PSP1 and alternative splice variants, i.e., homologs and isoforms, and polymorphic variants. 
PSP1 binds to a region encompassing amino acids 269-413 of the human PS-1 protein, contains a conserved serine 
protease motif and exhibits homology to the E. coli serine protease htrA described by Lipinska et ai in Nuct. Acids 
Res. 16, 10053-10066 (1 988) and a putative human serine protease with an IGF-binding motif (Oh no, I., era/., Gen bank 
Accession No. D87258 (1996)), hereinafter referred to as D87258. 

The PSP1-1 sequence is listed in SEQ ID NO: 24. The coding region of this sequence consists of nucleotides 
603-1979 of SEQ ID NO: 24. The deduced 458 amino acid sequence of the encoded product PSP1-1 is listed in SEQ 
ID NO: 25. 

The PSP1-1 sequence listed in SEQ ID NO: 30 includes two polymorphic variants, at nucleotides 672 (C/T) and 
1 435 (C/T) resulting in alternative amino acid residues at position 24 (arg/cys) and 278 (ala/val), both in the conserved 
region of nucleotides 1-1 540. The deduced 458 amino acid sequence of the encoded product PSP1-1 is listed in SEQ 
ID NO: 31. 

The PSP1-2 sequence is listed in SEQ ID NO: 23. The coding region of this sequence consists of nucleotides 
603-1979 of SEQ ID NO: 23. The deduced 458 amino acid sequence of the encoded product PSP1-2 is listed in SEQ 
ID NO: 8. The PSP 1 -3 sequence is listed in SEQ ID NO: 26. The coding region of this sequence consists of nucleotides 
603-1736 of SEQ ID NO: 26. The deduced 377 amino acid sequence of the encoded product PSP1-3 is listed in SEQ 
I D NO: 27. The PSP1 -4 sequence is listed in SEQ ID NO: 2B. The coding region of this sequence consists of nucleotides 
603-1913 of SEQ ID NO: 28. The deduced 436 amino acid sequence ofthe encoded product PSP1-4 is listed in SEQ 
ID NO: 29. 

The D87258 sequence is listed in SEQ ID NO: 17. The coding region of this sequence consists of nucleotides 
49-1491 of SEQ ID NO: 17. The deduced 480 amino acid sequence of the encoded product D87258 is listed in SEQ 
ID NO: 18. The D87258 sequence listed in SEQ ID NO: 17 includes a polymorphic variant at nucleotide 1325 (G/T) 
resulting in alternative amino acid residues at position 21 3 (gly/val). The sequence in Genbank Accession No. D87258 
(1996)), describes only 1325G. The novel polynucleotide polymorph of D87258 having 1325T, is hereinafter referred 
to as D87258 (1325T) and the novel encoded product having valine at 213 is D87258 (1325T) protein. The novel 
polynucleotide D87258 (1 325T) and its encoded protein can replace PSP-1 in any of the composition, uses or methods 
herein described and such novel polypeptide, encoded protein, compositions, uses and methods also form part of the 
invention. 

As used herein, the term "functional fragments" when used to modify a specific gene or gene product means a 
less than full length portion of the gene or gene product which retains substantially all of the biological function asso- 
ciated with the full length gene or gene product to which it relates. An example of a functional fragment of PSP1 is the 
minimal catalytic domain. To determine whether a fragment of a particular gene or gene product is afunctional fragment, 
fragments are generated by well-known nucleolytic or proteolytic techniques or by the polymerase chain reaction and 
the fragments tested for the described biological function. 

As used herein, an "antigen - refers to a molecule containing one or more epitopes that will stimulate a hosts 
immune system to make a humoral and/or cellular antigen-specific response. The term is also used herein interchange- 
ably with "immunogen.'* 

As used herein, the term "epitope" refers to the site on an antigen or hapten to which a specific antibody molecule 
binds. The term is also used herein interchangeably with "antigenic determinant" or "antigenic determinant site." 

As used herein, 'monoclonal antibody" is understood to include antibodies derived from one species (e.g., murine, 
rabbit, goat, rat, human, etc.) as well as antibodies derived from two (or perhaps more) species (e.g., chimeric and 
humanized antibodies). 

As used herein, a coding sequence is "operably linked to" another coding sequence when RNA polymerase will 
transcribe the two coding sequences into a single mRNA, which is then translated into a single polypeptide having 
amino acids derived from both coding sequences. The coding sequences need not be contiguous to one another so 
long as the expressed sequence is ultimately processed to produce the desired protein. 

As used herein, "recombinant" polypeptides refer to polypeptides produced by recombinant DNA techniques; i.e., 
produced from cells transformed by an exogenous DNA construct encoding the desired polypeptide. "Synthetic" 
polypeptides are those prepared by chemical synthesis. 

As used herein, a "replicon" is any genetic element (e.g., plasmid, chromosome, virus) that functions as an auton- 
omous unit of DNA replication in vivo; i.e., capable of replication under its own control. 
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As used herein, a "vector" is a replicon, such as a plasmid, phage, or cosmid, to which another DNA segment may 
be attached so as to bring about the replication of the attached segment. 

As used herein, a "reference" gene refers to the wild type PSP1 sequenc of the invention and is understood to 
include the various sequence polymorphisms that exist, wherein nucleotide substitutions in the gene sequence exist, 
but do not affect the essential function of th gene product. 

As used herein, a "mutant" gene refers to PSP1 sequences different from the reference gene wherein nucleotide 
substitutions and/or deletions and/or insertions result in perturbation of the essential function of the gene product. 

As used herein, a DNA "coding sequence of or a "nucleotide sequence encoding" a particular protein, is a DNA 
sequence which is transcribed and translated into a polypeptide when placed under the control of appropriate regulatory 
sequences. 

As used herein, a "promoter sequence" is a DNA regulatory region capable of binding RNA polymerase in a cell 
and initiating transcription of a downstream (3' direction) coding sequence. For purposes of defining the present in- 
vention, the promoter sequence is bound at its 3' terminus by a translation start codon (e.g., ATQ) of a coding sequence 
and extends upstream (5' direction) to include the minimum number of bases or elements necessary to initiate tran- 
scription at levels detectable above background. Within the promoter sequence will be found a transcription initiation 
site (conveniently defined by mapping with nuclease S1), as well as protein binding domains (consensus sequences) 
responsible for the binding of RNA polymerase. Eukaryotic promoters will often, but not always, contain "TATA" boxes 
and "CAT" boxes. Prokaryotic promoters contain Shine-Dalgarno sequences in addition to the -10 and -35 consensus 
sequences. 

As used herein, DNA "control sequences" refers collectively to promoter sequences, ribosome binding sites, poly- 
adenylation signals, transcription termination sequences, upstream regulatory domains, enhancers and the like, which 
collectively provide for the expression (i.e., the transcription and translation) of a coding sequence in a host cell. 

As used herein, a control sequence "directs the expression 0 of a coding sequence in a cell when RNA polymerase 
will bind the promoter sequence and transcribe the coding sequence into mRNA, which is then translated into the 
polypeptide encoded by the coding sequence. 

As used herein, a "host cell" is a cejl which has been transformed or transfected, or is capable of transformation 
or transfection by an exogenous DNA sequence. 

As used herein, a cell has been "transformed" by exogenous DNA when such exogenous DNA has been introduced 
inside the cell membrane. Exogenous DNA may or may not be integrated (covalently linked) into chromosomal DNA 
making up the genome of the cell. In prokaryotes and yeasts, for example, the exogenous DNA may be maintained 
on an episomal element, such as a plasmid. With respect to eukaryotic cells, a stably transformed or transfected cell 
is one in which the exogenous DNA has become integrated into the chromosome so that it is inherited by daughter 
cells through chromosome replication. This stability is demonstrated by the ability of the eukaryotic cell to establish 
cell lines or clones comprised of a population of daughter cells containing the exogenous DNA. 

As used herein, "transfection" or "transfected" refers to a process by which cells take up foreign DNA and integrate 
that foreign DNA into their chromosome. Transfection can be accomplished, for example, by various techniques in 
which cells take up DNA (e.g., calcium phosphate precipitation, electroporation, assimilation of liposomes, etc.) or by 
infection, in which viruses are used to transfer DNA into cells. 

As used herein, a "target cell" is a cell that is selectively transfected over other cell types (or cell lines). 
As used herein, a ■clone" is a population of cells derived from a single cell or common ancestor by mitosis. A "cell 
line" is a clone of a primary cell that is capable of stable growth in vitro for many generations. 

As used herein, a "heterologous" region of a DNA construct is an identifiable segment of DNA within or attached 
to another DNA molecule that is not found in association with the other molecule in nature. Thus, when the heterologous 
region encodes a gene, the gene will usually be flanked by DNA that does not flank the gene in the genome of the 
source animal. Another example of a heterologous coding sequence is a construct where the coding sequence itself 
is not found in nature (e.g., synthetic sequences having codons different from the native gene). Allelic variation or 
naturally occurring mutational events do not give rise to a heterologous region of DNA, as used herein. 

As used herein, a "modulator" of a polypeptide is a substance which can affect the polypeptide function, such as 
an inhibitor of enzymatic activity. 

An aspect of the present invention is isolated polynucleotides encoding a PSP1 protein and substantially similar 
sequences. Isolated polynucleotide sequences are substantially similar if they are capable of hybridizing under mod- 
erately stringent conditions to SEQ ID NOs: 23, 24, 26 or 28 or they encode DNA sequences which are degenerate lo 
SEQ ID NOs: 23, 24; 26 or 28 or are degenerate to those sequences capable of hybridizing under moderately stringent 
conditions to SEQ ID NOs: 23, 24, 26 or 28. 

Moderately stringent conditions is a term understood by the skilled artisan and has been described in, for example, 
Sambrook et al Molecular Cloning: A Laboratory Manual, 2nd edition, Vol. 1, pp. 101-104, Cold Spring Harbor Labo- 
ratory Press (1989). An exemplary hybridization protocol using moderately stringent conditions Is as follows. Nitrocel- 
lulose filters are prehybridized at 65°C in a solution containing 6X SSPE, 5X Denhardfs solution (10g Ficoll, 10g BSA 
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and 10g polyvinylpyrrolidone per liter solution), 0.05% SDS and 100 ug/mt tRNA. Hybridization probes are labeled, 
preferably radiolabelled (e.g., using the Bios TAG-IT® kit). Hybridization is then carried out for approximately 18 hours 
at 65° C. The filters are then washed twice in a solution of 2X SSC and 0.5% SDS at room temperature for 1 5 minutes. 
Subsequently, the filters are washed at 5B°C, air-dried and exposed to X-ray film overnight at -70°C with an intensifying 
screen. 

Degenerate DNA sequences encode the same amino acid sequence as SEQ IDNOs: 8, 25, 27 or 29 or the proteins 
encoded by that sequence capable of hybridizing under moderately stringent conditions to SEQ ID NOs: 8, 25, 27, 29, 
but have variation(s) in the nucleotide coding sequences because of the degeneracy of the genetic code. For example, 
the degenerate codons UUC and UUU both code for the amino acid phenylalanine, whereas the four codons GGX, 
where X = U, C, A, or G, all code for glycine. 

Alternatively, substantially similar sequences are defined as those nucleotide sequences encoding proteins having 
PSP1 activity in which about 70%, preferably about 80%, and most preferably about 90%, of the nucleotides share 
identity with PSP1, i.e., a sequence encoding a protein having PSP1 activity is substantially similar to any of SEQ ID 
NOs: 23, 24, 26 or 28 when at least about 70% of all of the nucleotides of the sequence match SEQ ID NOs: 23, 24, 
26 or 28. Nucleotide sequences that are substantially similar can be identified by hybridization or by sequence com- 
parison. 

Embodiments of the isolated polynucleotides of the invention include DNA, genomic DNA and RNA, preferably of 
human origin. A method for isolating a nucleic acid molecule encoding a PSP1 protein is to probe a genomic or cDNA 
library with a natural or artificially designed probe using art recognized procedures. See, e.g., "Current Protocols in 
Molecular Biology", Ausubel et af. (eds.) Greene Publishing Association and John Wiley Interscience, New York, 
1989,1992. The ordinarily skilled artisan will appreciate that SEQ ID NOs: 23, 24, 26 or 28 or fragments thereof com- 
prising at least 15 contiguous nucleotides are particularly useful probes. It is also appreciated that such probes can 
be and are preferably labeled with an analytically detectable reagent to facilitate identification of the probe. Useful 
reagents include, but are not limited to, radioisotopes, fluorescent dyes or enzymes capable of catalyzing the formation 
of a detectable product. The probes would enable the ordinarily skilled artisan to isolate complementary copies of 
genomic DNA, cDNA or RNA polynucleotides encoding PSP1 proteins from human, mammalian or other animal sourc- 
es or to screen such sources for related sequences, e.g., additional members of the family, type and/or subtype, in- 
cluding transcriptional regulatory and control elements as well as other stability, processing, translation and tissue 
specificity-determining regions from 5' and/or 3' regions relative to the coding sequences disclosed herein, all without 
undue experimentation. 

Another aspect of the invention is functional polypeptides encoded by the polynucleotides of the invention and 
substantially similar polypeptides. An embodiment of a functional polypeptide of the invention is the PSP1 protein 
having the amino acid sequence set forth in SEQ ID NO: 8, 25, 27 or 29. 

Polypeptide sequences that are substantially similar are those sequences having PSP activity in which about 50%, 
preferably 70%, and most preferably about 90%, of the amino acids share identity with PSPI, i.e.. a sequence repre- 
senting a polypeptide having PSP1 activity is substantially similar to any of SEQ ID NOs: 8, 24, 26 or 28 when at least 
about 50% of all of the amino acids of the sequence match SEQ I D NOs: 8, 25, 27 or 29. Substantially similar polypeptide 
sequences can be identified by techniques such as proteolytic digestion, gel electrophoresis, microsequencing and/ 
or sequence comparison, e.g., through use of the GAP algorithm available from the University of Wisconsin Genetics 
Computer Group. 

Another aspect of the invention is a method for preparing essentially pure PSP1 protein. Yet another aspect is the 
PSP1 protein produced by the preparation method of the invention. This protein has the amino acid sequence listed 
in SEQ ID NOs: 8, 25, 27 or 29 and includes variants with a substantially similar amino acid sequence that have the 
same function. The proteins of this invention are preferably made by recombinant genetic engineering techniques by 
culturing a recombinant host cell containing a vector encoding the polynucleotides of the invention under conditions 
promoting the expression of the protein and recovery thereof. 

The isolated polynucleotides, particularly the DNAs, can be introduced into expression vectors by operatively link- 
ing the DNA to the necessary expression control regions, e.g., regulatory regions, required for gene expression. The 
vectors can be introduced into an appropriate host cell such as a prokaryotic, e.g., bacterial, or eukaryotic, e.g., yeast 
or mammalian cell by methods well known in the art. See Ausubel era/., supra. The coding sequences for the desired 
proteins, having been prepared or isolated, can be cloned into any suitable vector or replicon. Numerous cloning vectors 
are known to those of skill in the art and the selection of an appropriate cloning vector is a matter of choice. Examples 
of recombinant DNA vectors for cloning and host cells which they can transform include, but are not limited to; the 
bacteriophage (E cofi), pBR322 (E. colt), pACYC 177 (E. colt), pKT230 (gram-negative bacteria), pGV1106 (gram- 
negative bacteria), pLAFRI (gram -negative bacteria), pME290 (non-E colt gram-negative bacteria), pHV14 (E. cofi 
and Bacillus subtilis), pBD9 (Bacillus), plJ61 (Streptomyces), pUC6 (Streptomyces), Ylp5 (Saccharomyces), a bacu- 
lovirus insect cell system, a Drosophi/a insect system, YCp1 9 (Saccharomyces) and pSV2neo (mammalian cells). See 
generally, "DNA Cloning": Vols. I & II, G lover era/, ed. IRL Press Oxford (1 985) (1 987); and T. Maniatis et ai ("Molecular 
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Cloning" Cold Spring Harbor Laboratory (1982). 

The gene can be placed under the control of control elements such as a promoter, ribosome binding site (for 
bacterial expression) and, optionally, an operator, so that the DNA sequenc encoding the d sired protein is transcribed 
into RNA in the host cell transformed by a vector containing the expression construct The coding sequence may or 
may not contain a signal peptide or leader sequence. The proteins of the present invention can be expressed using, 
for example, the £. co//tac promoter or the protein A gene (spa) promoter and signal sequence. Leader sequences 
can be removed by the bacterial host in post-translational processing. See, e.g., U.S. Patent Nos. 4,431 ,739; 4,425,437 
and 4,338,397. 

In addition to control sequences, it may be desirable to add regulatory sequences which allow tor regulation of the 
expression of the protein sequences relative to the growth of the host cell. Regulatory sequences are known to those 
of skill in the art. Exemplary are those which cause the expression of a gene to be turned on or off in response to a 
chemical or physical stimulus, including the presence of a regulatory compound or to various temperature or metabolic 
conditions. Other types of regulatory elements may also be present in the vector, for example, enhancer sequences. 

An expression vector is constructed so that the particular coding sequence is located in the vector with the appro- 
priate regulatory sequences, the positioning and orientation of the coding sequence with respect to the control se- 
quences being such that the coding sequence is transcribed under the "control" of the control sequences, i.e., RNA 
polymerase which binds to the DNA molecule at the control sequences transcribes the coding sequence. Modification 
of the sequences encoding the particular antigen of interest may be desirable to achieve this end. For example, in 
some cases it may be necessary to modify the sequence so that it may be attached to the control sequences with the 
appropriate orientation; i.e., to maintain the reading frame. The control sequences and other regulatory sequences 
may be ligated to the coding sequence prior to insertion into a vector, such as the cloning vectors described above. 
Alternatively, the coding sequence can be cloned directly into an expression vector which already contains the control 
sequences and an appropriate restriction site. 

In some cases, it may be desirable to produce mutants or analogues of PSP1 protein. Mutants or analogues may 
be prepared by the deletion of a portion of the sequence encoding the protein, by insertion of a sequence, and/or by 
substitution of one or more nucleotides within the sequence. Techniques for modifying nucleotide sequences, such as 
site-directed mutagenesis, are well known to those skilled in the art. See, e.g., T Maniatis era/., supra, "DNA Cloning, 
" Vols. I and II, supra; and "Nucleic Acid Hybridization", supra. 

Depending on the expression system and host selected, the proteins of the present invention are produced by 
growing host cells transformed by an expression vector described above under conditions whereby the protein of 
interest is expressed. Preferred mammalian cells include human embryonic kidney cells (293), monkey kidney cells, 
fibroblast (COS) cells, Chinese hamster ovary (CHO) cells, Drosophila or murine L-cells. If the expression system 
secretes the protein into growth media, the protein can be purified directly from the media. If the protein is not secreted, 
it is isolated from cell lysates or recovered from the cell membrane fraction. The selection of the appropriate growth 
conditions and recovery methods are within the skill of the art. 

An alternative method to identify proteins of the present invention is by constructing gene libraries, using the re- 
sulting clones to transform E. coti and pooling and screening individual colonies using polyclonal serum or monoclonal 
antibodies to PSP1 . 

The proteins of the present invention may also be produced by chemical synthesis such as solid phase peptide 
synthesis on an automated peptide synthesizer, using known amino acid sequences or amino acid sequences derived 
from the DNA sequence of the genes of interest. Such methods are known to those skilled in the art. 

The proteins of the present invention or their immunogenic fragments comprising at least one epitope can be used 
to produce antibodies, both polyclonal and monoclonal, directed to epitopes corresponding to amino acid sequences 
disclosed herein. If polyclonal antibodies are desired, a selected mammal such as a mouse, rabbit, goat or horse is 
immunized with a protein of the present invention, or its fragment, or a mutant protein. Serum from the immunized 
animal is collected and treated according to known procedures. Serum polyclonal antibodies can be purified by immu- 
noaffinity chromatography or other known procedures. 

Monoclonal antibodies to the proteins of the present invention, and to the immunogenic fragments thereof, can 
also be readily produced by one skilled in the art. The general methodology for making monoclonal antibodies by using 
hybridoma technology is well known. Immortal antibody-producing cell lines can be created by cell fusion and also by 
other techniques such as direct transformation of B lymphocytes with oncogenic DNA or transfection with Epstein-Barr 
virus. See, e.g., M. Schreier era/., "Hybridoma Techniques" (1980); Hammerling eta/., "Monoclonal Antibodies and T- 
-cell Hybridomas" (1981); Kennett et at:, "Mon-cc!c^arAntibodies l "(T980);-md'U;S:Pa1em"Nos:"4;341";76r; 4;399;T21; 
4,427,783; 4,444,887; 4,452,570; 4,466,917; 4,472,500; 4,491,632; and 4,493,890. Panels of monoclonal antibodies 
produced against the antigen of interest, or fragment thereof, can be screened for various properties, i.e., for isotype, 
epitope, affinity, etc. Monoclonal antibodies are useful in purification, using immunoaffinity techniques, of the individual 
antigens which they are directed against. Alternatively, genes encoding the monoclonals of interest may be isolated 
from the hybridomas by PGR techniques known in the art and cloned and expressed in the appropriate vectors. The 
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antibodies of this invention, whether polyclonal or monoclonal have additional utility in that they may be employed as 
reagents in immunoassays, RIA, ELISA, and the like. The antibodies of the invention can be labeled with an analytically 
detectable reagent such as a radioisotope, fluorescent molecule or enzyme. 

Chimeric antibodies, in which non-human variable regions are joined or fused to human constant regions (see, e. 
g., Liu etal., Proc. Natl Acad. Sci. USA, 84, 3439 (1987)), may also be used in assays or therapeutically. Preferably, 
a therapeutic monoclonal antibody would be "humanized" as described in Jones etal., Nature, 321, 522 (1986); Ver- 
hoeyen etal, Science, 239, 1534 (1988); Kabat eta/., J. Immunol., 147, 1709 (1991); Queen etaL, Proc. Natl Acad. 
Sci. USA, 86, 10029 (1989); Gorman etal., Proc. Natl Acad. Set. USA, 88, 34181 (1991); and Hodgson etal., Bio/ 
Technology, 9:, 421 (1991). 

Another aspect of the present invention is modulators of the polypeptides of the invention or of D87258. Functional 
modulation of PSP1 or D87258 by a substance includes partial to complete inhibition of function, such as inhibition of 
proteolytic activity, identical function, as well as enhancement of function. Embodiments of modulators of the invention 
include peptides, oligonucleotides and small organic molecules including peptidomimetics. Modulators of the invention 
may be useful as therapeutics or prophylactics for all forms of neurodegeneration including AD. Modulators of PSP1 
or D87258 proteolytic activity relative to other endogenous substrates may be also be useful for the treatment of other 
types of human disease states. 

Another aspect of the invention is antisense oligonucleotides comprising a sequence which is capable of binding 
to the polynucleotides of the invention. Synthetic oligonucleotides or related antisense chemical structural analogs can 
be designed to recognize, specifically bind to and prevent transcription of a target nucleic acid encoding PSPI or D87258 
protein by those of ordinary skill in the art. See generally, Cohen, J.S., Trends in Pharm. Sci., 10, 435(1989) and 
Weintraub, KM., Scientific American, January (1 990) at page 40, 

Another aspect of the invention is a method for assaying a medium for the presence of a substance that modulates 
PSP1 or D87258 protein function by affecting the binding of PSP1 or D87258 protein to cellular binding partners. 
Examples of modulators include, but are not limited to peptides and small organic molecules including peptidomimetics. 
A PSP1 or D87258 protein is provided having the amino acid sequence of PSP1 (SEQ ID NOs: 8, 25, 27 or 29) or 
D87258 (SEQ ID NO: 18) or a functional fragment thereof together with a cellular binding partner or synthetic analog 
thereof. The mixture is incubated with a test substance which is suspected of modulating PSPI or D87258 activity, 
under conditions which permit the formation of a PSP1 or D87258 gene product/cellular binding partner complex. An 
assay is performed for the presence of the complex, free PSP1 or DB7258 protein or free cellular binding partner and 
the result compared to a control to determine the effect of the test substance. 

Another aspect of the invention is a method for assaying a medium for the presence of a substance that modulates 
PSP1 or D87258 protein function by inhibiting its proteolytic activity on cellular substrates. Examples of modulators 
include, but are not limited to peptides and small organic molecules including peptidomimetics. Cellular substrates can 
include PS-1 , PS-2, APP or other substrates. A PSP1 or D87258 protein is provided having the amino acid sequence 
of PSP1 (SEQ ID NOs: 8, 25, 27 or 29) or D87258 (SEQ ID NO: 18) or a functional fragment thereof together with a 
cellular substrate or synthetic analog thereof. The mixture is incubated with a test substance which is suspected of 
inhibiting PSP1 or D87258 activity, under conditions which permit the formation of a PSP1 or D87258 enzyme/substrate 
complex and subsequent cleavage of the substrate. 

Another aspect of the invention is a method for assaying for the presence of a substance that modulates PSP1 or 
D87258 activity by direct binding to PSP1 or D87258 protein. Examples of modulators include, but are not limited to, 
peptides and small organic molecules including peptidomimetics. Modulator candidates are synthesized on a solid 
support by techniques such as those disclosed in Lam etal., Nature 354, 82 (1 991 ) or Burbaum et ai, Proc. Natl. Acad. 
Sci. USA 92, 6027 (1 995) to provide solid support-associated modulator candidates. A labelled PSP1. or D87258 protein 
is provided having the amino acid sequence of PSP1 (SEQ ID NOs: 8, 25, 27 or 29) or D87258 (SEQ ID NO: 18) or a 
functional derivative thereof. Exemplary labels include directly attached fluorescent or colored dyes, biotin, radioiso- 
topes or epitope tags, which are detectable by a suitable antibody A mixture of solid support-associated modulator 
candidates and labelled PSP1 or D87258 protein is incubated under conditions which can permit the formation of a 
PSP1 or D87258 protein/modulator candidate complex. The solid support is separated from free soluble labelled PSP1 
or D87258 protein. An assay is performed for the presence of solid support-associated labelled protein. Solid supports 
complexed with labelled protein are isolated and the identity of the modulator candidate determined by techniques well 
known to those skilled in the art, such as the TOF-SIMS method in Brummel et al. t Science 264, 399-402 (1994). 

Modulation of PSP1 or D87258 function would be expected to have effects on presenilin cleavage, the cleavage 
of other proteins or pA4 production Any modulators so identified would be expected 
the treatment and prevention of neurodegeneration including FAD and AD. 

Further, PSP1 or D87258 could be used to isolate proteins which interact with it and this interaction could be a 
target for interference. Inhibitors of protein -protein interactions between PSP1 or D87258 and other factors could lead 
to the development of pharmaceutical agents for the modulation of PSP1 or D87258 activity. 

Methods to assay for protein-protein interactions, such as that of a PSP1 or D8725B gene product/binding partner 
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complex, and to isolate proteins interacting with PSP1 or D87258 are known to those skilled in the art. Use of the 
methods discussed below enable one of ordinary skill in the art to accomplish these aims without undue experimen- 
tation. 

The yeast two-hybrid system provides methods for detecting the interaction between a first test protein and a 
second test protein, in vivo, using reconstitution of the activity of a transcriptional activator. The m thod is disclosed 
in U.S. Patent No. 5,283,173; reagents are available from Clontech and Stratagene. Briefly, PSP1 cDNA is fused to a 
Gal4 or LexA transcription factor DNA binding domain and expressed in yeast cells. cDNA library members obtained 
from cells of interest are fused to a transactivation domain of Ga/4 or another transactivation domain. cDNA clones 
which express proteins which can interact with PSP1 will lead to reconstitution of transcription factor activity such as 
Ga/4 and transactivation of a reporter gene expression such as Gall-tacZ. 

An alternative method is screening of X.gt11, XZAP (Stratagene) or equivalent cDNA expression libraries with re- 
combinant PSP1 . Recombinant PSP1 protein or fragments thereof are fused to small peptide tags such as FLAG t HSV 
or GST. The peptide tags can possess convenient phosphorylation sites for a kinase such as heart muscle creatine 
kinase or they can be biotinylated. Recombinant PSP1 can be phosphorylated with 32 [P] or used unlabeled and de- 
tected with streptavidin or antibodies against the tags. A,gt11cDNA expression libraries are made from cells of interest 
and are incubated with the recombinant PSP1, washed and cDNA clones isolated which interact with PSP1. See, e. 
g., T. Maniatis et af., supra. 

Another method is the screening of a mammalian expression library in which the cDNAs are cloned into a vector 
between a mammalian promoter and polyadenylation site and transiently transfected in COS or 293 cells followed by 
detection of the binding protein 48 hours later by incubation of fixed and washed cells with a labelled PSP1 , prefereably 
iodinated, and detection of bound PSP1 by autoradiography (See Sims eta/., Science 241, 585-589 (1988) and Mc- 
Mahan et at., BMBO J. 10, 2821-2832 (1991)). In this manner, pools of cDNAs containing the cDNA encoding the 
binding protein of interest can be selected and the cDNA of interest can be isolated by further subdivision of each pool 
followed by cycles of transient transfection, binding and autoradiography. Alternatively, the cDNA of interest can be 
isolated by transfecting the entire cDNA library into mammalian cells and panning the cells on a dish containing PSP1 
bound to the plate. Cells which attach after washing are lysed and the plasmid DNA isolated, amplified in bacteria, and 
the cycle of transfection and panning repealed until a single cDNA clone is obtained (See Seed et al, Proa Natl. Acad. 
Sci USA 84, 3365 (1987) and Aruffo et al., EMBO J.6, 3313 (1987)). If the binding protein is secreted, its cDNA can 
be obtained by a similar pooling strategy once a binding or neutralizing assay has been established for assaying 
supernatants from transiently transfected cells. General methods for screening supernatants are disclosed in Wbng et 
af., Science 228, 810-B15 (1985). 

Another alternative method is isolation of proteins interacting with PSP1 directly from cells. Fusion proteins of 
PSP1 with GST or small peptide tags are made and immobilized on beads. Biosynthetically labeled or unlabeled protein 
extracts from the cells of interest are prepared, incubated with the beads and washed with buffer. Proteins interacting 
with PSP1 are eluted specifically from the beads and analyzed by SDS-PAGE. Binding partner primary amino acid 
sequence data are obtained by microsequencing. Optionally, the cells can be treated with agents that induce a functional 
response such as tyrosine phosphorylation of cellular proteins. An example of such an agent would be a growth factor 
or cytokine such as erythropoietin or interleukin-3. 

Another alternative method is immunoaffinity purification. Recombinant PSP1 is incubated with labeled or unla- 
beled cell extracts and immunoprecipitated with anti-PSP1 antibodies. The immunoprecipitate is recovered with protein 
A-Sepharose and analyzed by SDS-PAGE. Unlabelied proteins are labeled by biotinylation and detected on SDS gels 
with streptavidin. Binding partner proteins are analyzed by microsequencing. Further, standard biochemical purification 
steps known to those skilled in the art may be used prior to microsequencing. 

Yet another alternative method is screening of peptide libraries for binding partners. Recombinant tagged or labeled 
PSP1 is used to select peptides from a peptide or phosphopeptide library which interact with PSP1 . Sequencing of the 
peptides leads to identification of consensus peptide sequences which might be found in interacting proteins. 

PSP1 or D87258 binding partners identified by any of these methods or other methods which would be known to 
those of ordinary skill in the art as well as those putative binding partners discussed above can be used in the assay 
method of the invention. Assaying for the presence of PSP1 or D87258 /binding partner complex are accomplished 
by, for example, the yeast two-hybrid system, ELISA or immunoassays using antibodies specific for the complex. In 
the presence of test substances which interrupt or inhibit formation of PSP1 or D87258 /binding partner interaction, a 
decreased amount of complex will be determined relative to a control lacking the test substance. 

Assay^-forfree^ are accompllshec^by, for example, EDSA or immunoassay — 

using specific antibodies or by incubation of radiolabeled PSP1 or D87258 with cells or cell membranes followed by 
centrrfugation or filter separation steps. In the presence of test substances which interrupt or inhibit formation of PSP1 
or D87258 /binding partner interaction, an increased amount of free PSP1 or D87258, or free binding partner will be 
determined relative to a control lacking the test substance. 

Another aspect of the invention is pharmaceutical compositions comprising an effective amount of a PSP1 or 
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D87258 modulator of the invention and a pharmaceutical^ acceptable carrier. Pharmaceutical compositions of mod- 
ulators of this invention for parenteral administration, i.e., subcutaneously, intramuscularly or intravenously or oral 
administration can be prepared. 

The compositions for parenteral administration will commonly comprise a solution of the modulators of the invention 
or a cocktail thereof dissolved in an acceptable carrier, preferably an aqueous carrier. A vari ty of aqueous carriers 
may be employed, e.g., water, buffered water, 0.4% saline, 0.3% glycine and the like. These solutions are sterile and 
generally free of particulate matter. These solutions may be sterilized by conventional, well-known sterilization tech- 
niques. The compositions may contain pharmaceutically acceptable auxiliary substances as required to approximate 
physiological conditions such as pH adjusting and buffering agents, etc. The concentration of the modulator of the 
invention in such pharmaceutical formulation can vary widely, i.e., from less than about 0.5%, usually at or at least 
about 1% to as much as 15 or 20% by weight and will be selected primarily based on fluid volumes, viscosities, etc. 
according to the particular mode of administration selected. 

Thus, a pharmaceutical composition of the modulator of the invention for intramuscular injection could be prepared 
to contain 1 mL sterile buffered water, and 50 mg of a protein of the invention. Similarly, a pharmaceutical composition 
of the modulatorof the invention for intravenous infusion could be made up to contain 250 mL of sterile Ringer's solution, 
and 150 mg of a modulator of the invention. Actual methods for preparing parenteral^ administrable compositions are 
well known or will be apparent to those skilled in the art and are described in more detail in, for example, Remington's 
Pharmaceutical Science, 15th ed., Mack Publishing Company, Easton, Pennsylvania. 

The physician will determine the dosage of the present therapeutic agents which will be most suitable and it will 
vary with the form of administration and the particular compound chosen, and furthermore, it will vary with the particular 
patient under treatment. Generally, the physician will wish to initiate treatment with small dosages substantially less 
than the optimum dose of the compound and increase the dosage by small increments until the optimum effect under 
the circumstances is reached. It will generally be found that when the composition is administered orally, larger quan- 
tities of the active agent will be required to produce the same effect as a smaller quantity given parenterally. The 
therapeutic dosage will generally be from 0.1 to 1000 milligrams per day and higher although it may be administered 
in several different dosage units. 

Depending on the patient condition, the pharmaceutical composition of the invention can be administered for pro- 
phylactic and/or therapeutic treatments. In therapeutic applications, compositions containing the present compounds 
or a cocktail thereof are administered to a patient already suffering from a disease in an amount sufficient to cure or 
at least partially arrest the disease and its complications. In prophylactic applications, compositions containing the 
present compounds or a cocktail thereof are administered to a patient not already in a disease state to enhance the 
patient's resistance to the disease. 

Single or multiple administrations of the pharmaceutical compositions can be carried out with dose levels and 
pattern being selected by the treating physician, in any event, the pharmaceutical composition of the invention should 
provide a quantity of the modulators of the invention sufficient to effectively treat the patient. 

Additionally, some diseases result from inherited defective genes. These genes can be detected by comparing the 
sequence of the defective gene with that of a normal one. Individuals carrying mutations in the PSP1 or D87258 gene 
may be detected at the DNA level by a variety of techniques. Nucleic acids used for diagnosis (genomic DNA, mRNA, 
etc.) may be obtained from a patient's cells, such as from blood, urine, saliva or tissue biopsy, e.g., chorionic villi 
sampling or removal of amniotic fluid cells and autopsy material. The genomic DNA may be used directly for detection 
or may be amplified enzymatically by using PCR, ligase chain reaction (LCR), strand displacement amplification (SDA), 
etc. prior to analysis. See, e.g., Saikief a/., Nature, 324, 163-166 (1986), Bej, etal, CriL Rev. Biochem. Molec. Biol., 
26, 301-334 (1991), Birkenmeyer et aL, J. Virol. Meth., 35, 117-126 (1991), van Brunt, J., Biotechnology, 8, 291-294 
(1 990)). RNA or cDNA may also be used for the same purpose. As an example, PCR primers complementary to the 
nucleic acid of the instant invention can be used to identify and analyze PSP1 or D87258 mutations. For example, 
deletions and insertions can be detected by a change in size of the amplified product in comparison to the normal 
PSP1 or D87258 genotype. Point mutations can be identified by hybridizing amplified DNA to rabiolabeled PSP1 or 
D87258 RNA of the invention or alternatively, radiolabeled PSP1 or D87258 antisense DNA sequences of the invention. 
Perfectly matched sequences can be distinguished from mismatched duplexes by RNase A digestion or by differences 
in melting temperatures (Tm). Such a diagnostic would be particularly useful for prenatal and even neonatal testing. 

In addition, point mutations and other sequence differences between the reference gene and "mutant" genes can 
be identified by yet other well-known techniques, e.g., direct DNA sequencing, single-strand conformational polymor- 
phisTTi^ee-errta-efna/T^nam/c^^ 

PCR product or a single-stranded template molecule generated by a modified PCR. The sequence determination is 
performed by conventional procedures with radiolabeled nucleotides or by automatic sequencing procedures with flu- 
orescent-tags. Cloned DNA segments may also be used as probes to detect specific DNA segments. The sensitivity 
of this method is greatly enhanced when combined with PCR. Further, point mutations and other sequence variations, 
such as polymorphisms, can be detected as described above, e.g., through the use of allele-specific oligonucleotides 
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for PCR amplification of sequences that differ by single nucleotides. Oligonucleotides having sequences as set forth 
in SEQ ID Nos: 32, 33, 34, 35, 36, 37, 38, 39 and 40 are useful in such a method. These methods are useful for 
determining the genetic predisposition to neurodegeneration in a patient by detecting polymorphisms within PSP1 or 
D87258 in a sample from a patient. Preferably, the polymorphisms detected are at nucleotide 672 of PSP1 , at nucleotide 
1435 of PSP1 or at nucleotide 1325 of D87258. Pr ferably, the polymorphisms are detected by PCR; most preferably 
the polymorphisms are detected by PCR with oligonucleotides having a nucleotide sequence selected from the group 
consisting of SEQ ID NOs: 32, 33, 34, 35, 36, 37, 38, 39 and 40.PreferabIy, the neurodegeneration predisposition 
determined is to Alzheimer's disease. 

Genetic testing based on DNA sequence differences may be achieved by detection of alteration in electrophoretic 
mobility of DNA fragments in gels with or without denaturing agents. Small sequence deletions and insertions can be 
visualized by high resolution gel electrophoresis. DNA fragments of different sequences may be distinguished on de- 
naturing formamide gradient gels in which the mobilities of different DNA fragments are retarded in the gel at different 
positions according to their specific melting or partial melting temperatures. See, e.g., Myers eta/., Science, 230, 1242 
(1985). In addition, sequence alterations, in particular small deletions, may be detected as changes in the migration 
pattern of DNA heteroduplaxes in non-denaturing gel electrophoresis such as heteroduplex electrophoresis. See, e. 
g., Nagamine era/., Am. J. Hum. Genet, 45, 337-339 (1989). Sequence changes at specific locations may also be 
revealed by nuclease protection assays, such as RNase and S1 protection or the chemical cleavage method as dis- 
closed by Cotton etal in Proc. Nat!. Acad. Sci. USA, 85, 4397-4401 (1985). 

Thus, the detection of a specific DNA sequence may be achieved by methods such as hybridization (e.g., heter- 
oduplex elect roporation, see, White etal., Genomics, 12, 301 -306 (1 992), RNAse protection (e.g., Myers etal., Science, 
230, 1242 (1985)) chemical cleavage (e.g., Cotton et ai., Proc. Natl. Acad. Set. USA, 85, 4397-4401 (1985)), direct 
DNA sequencing, or the use of restriction enzymes (e.g., restriction fragment length polymorphisms (RFLP) in which 
variations in the number and size of restriction fragments can indicate insertions, deletions, presence of nucleotide 
repeats and any other mutation which creates or destroys an endonuclease restriction sequence). Southen blotting of 
genomic DNA may also be used to identify large (i.e., greater than 100 base pair) deletions and insertions. 

In addition to conventional gel electrophoresis and DNA sequencing, mutations such as microdeletions, aneuploi- 
dies, translocations, inversions, can also be detected by in situ analysis. See, e.g., Keller etal., DNA Probes, 2nd Ed., 
Stockton Press, New York, N.Y., USA (1993). That is, DNA or RNA sequences in cells can be analyzed for mutations 
without isolation and/or immobilization onto a membrane. Fluorescence in situ hybridization (FISH) is presently the 
most commonly applied method and numerous reviews of FISH have appeared. See, e.g., Trachuck etal., Science, 
250, 559-562 (1 990), and Trask et al., Trends, Genet, 7, 1 49-1 54 (1 991 ). Hence, by using nucleic acids based on the 
structure of the PSPI or D87258 genes, one can develop diagnostic tests for genetic mutations. 

In addition, some diseases are a result of, or are characterized by changes in gene expression which can be 
detected by changes in the mRNA. Alternatively, the PSP1 or D87258 gene can be used as a reference to identify 
individuals expressing an increased or decreased level of PSP1 or D87258 mRNA, e.g., by Northern blotting or in situ 
hybridization. 

Defining appropriate hybridization conditions is within the skill of the art. See, e.g., "Current Protocols in Mol. Biol. 
" Vol. I & (I, Wiley Interscience. Ausbel et ai. (eds.) (1 992). Probing technology is well known in the art and it is appre- 
ciated that the size of the probes can vary widely but it is preferred that the probe be at least 15 nucleotides in length. 
It is also appreciated that such probes can be and are preferably labeled with an analytically detectable reagent to 
facilitate identification of the probe. Useful reagents include but are not limited to radioisotopes, fluorescent dyes or 
enzymes capable of catalyzing the formation of a detectable product. As a general rule, the more stringent the hybrid- 
ization conditions the more closely related genes will be that are recovered. 

The putative role of PSP1 or D87258 in presenilin biochemistry establishes yet another aspect of the invention 
which is gene therapy. "Gene therapy" means gene supplementation where an additional reference copy of a gene of 
interest is inserted into a patient's cells. As a result, the protein encoded by the reference gene corrects the defect and 
permits the cells to function normally, thus alleviating disease symptoms. The reference copy would be a wild-type 
form of the PSP1 or D87258 gene or a gene encoding a protein or peptide which modulates the activity of the endog- 
enous PSP1 or D87258. 

Gene therapy of the present invention can occur in vivo or ex vivo. Ex vivo gene therapy requires the isolation and 
purification of patient cells, the introduction of a therapeutic gene and introduction of the genetically altered cells back 
into the patient. A re plication -deficient virus such as a modified retrovirus can be used to introduce the therapeutic 
PSP^or-0a725£geneinto^ch-ceH — 
clinical gene therapy trials. See, e.g., Boris-Lauerie etal., Curr. Opin. Genet Dev., 3, 102-109 (1993). 

In contrast, in vivo gene therapy does not require isolation and purification of a patient's cells. The therapeutic 
gene is typically "packaged" for administration to a patient such as in liposomes or in a replication-deficient virus such 
as adenovirus as described by Berkner, K.L, in Curr. Top. Microbiol. Immunol., 158, 39-66 (1 992) or ade no-associated 
virus (AAV) vectors as described by Muzyczka, N,, in Curr. Top. Microbiol. Immunol., 158, 97-129 (1992) and U.S. 
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Patent No. 5,252,479. Another approach is administration of "naked DNA" in which the therapeutic gene is directty 
injected into the bloodstream or muscle tissue. Another approach is administration of -naked DNA" in which the ther- 
apeutic gene is introduced into the target tissue by microparticle bombardment using gold particles coated with the DNA. 

Cell types useful tor gene therapy of the present invention include lymphocytes, hepatocytes, myoblasts, fibrob- 
lasts, any cell of the eye such as retinal cells, epithelial and endothelial cells. Preferably the cells are T lymphocytes 
drawn from the patient to be treated, hepatocytes, any cell of the eye or respiratory or pulmonary epithelial cells. 
Transfection of pulmonary epithelial cells can occur via inhalation of a neubulized preparation of DNA vectors in lipo- 
somes, DNA-protein complexes or replication -deficient adenoviruses. See, e.g., U.S. Patent No. 5,240,846. 

Another aspect of the invention is transgenic, non-human mammals capable of expressing the polynucleotides of 
the invention or D87258 in any cell. Transgenic, non-human animals may be obtained by transfecting appropriate 
fertilized eggs or embryos of a host with the polynucleotides of the invention, with D87258 or with mutant forms found 
in human diseases. See, e.g., U.S. Patent Nos. 4,736,866; 5,175,385; 5,175,384 and 5,175,386. The resultant trans- 
genic animal may be used as a model for the study of PSP1 or D87258ger\e function. Particularly useful transgenic 
animals are those which display a detectable phenotype associated with the expression of the PSP1 or D87258 protein. 
Drug development candidates may then be screened for their ability to reverse or exacerbate the relevant phenotype. 

The present invention will now be described with reference to the following specific, non-limiting examples. 

Example 1 - Identification of the PS-1 Binding Partner PSP1 

A portion of PS-1 cDNA (GenBank Accession No. L42110) (SEQ ID NO: 9) encoding residues 269-413 of the PS- 
1 amino acid sequence (SEQ ID NO: 10) was PCR amplified with the oligonucleotide primers 5'-CGGAATTCCGTAT- 
GCTGGTTGAAACA-3' (SEQ ID NO: 11 ) and S'-CGGGATCCTCAGGCTACGAAACAGGCTAT-S* (SEQ ID NO: 12). The 
product was digested with fcoRI and BamHI and cloned into pEG202 (Golemis et al, in Current Protocols in Molecular 
Biology, John Wiley & Sons, New York (1 994)). The resulting plasmid, pCC352 t encoded a fusion protein in which the 
DNA binding protein, LexA, was fused in-frame to amino acids 269-41 3 of PS-1 . The parent vector, pEG202, was a 
yeast expression vector which uses the alcohol dehydrogenase (ADH1) promoter to express the LexA fusion proteins 
and HIS3 as the selectable marker. Sequence analysis using an automated DNA sequencer (Applied Biosystems, Inc.) 
confirmed that the amplified region had the correct sequence and was fused in-frame to LexA. 

All procedures, plasmids and strains used in the two-hybrid screen have been described in detail by Golemis et 
al t supra. Yeast strain EGY48 (MATa, trp1, his3, ura3, 6ops-LEU2) was cotransformed with the plasmids pCC352and 
pSH 18-34. Transformants were selected using complete minimal media lacking uracil and histidine. The plasmid 
pSH 18-34 is a yeast expression vector in which eight LexA operator sites are located upstream of a minimal GAL1 
promoter which drives the expression of the LacZ gene and URA3 as a selectable marker. Synthesis of the full length 
LexA-PS-1 fusion was confirmed by Western blot analysis of yeast extracts using polyclonal antisera directed against 
LexA. It was confirmed that the LexA-PS-1 fusion alone was unable to activate neither the LEU2nor LacZ reporter 
strains, in addition, the ability of the LexA-PS-1 fusion to enter the nucleus and bind DNA was confirmed using a 
repression assay. 

A strain containing the LexA-PS-1 fusion and pSH18-34 (CCY321 ) was transformed with a human fetal brain cDNA 
library (Clontech) in plasmid pJG4-5 using a library scale transformation protocol. This library plasmid contains the 
TRP1 selectable marker and allows the expression of cDNAs as fusions (AD fusions) to a cassette containing the SV40 
nuclear localization sequence, the acid blob B42, and the hemagglutinin epitope tag. See Gyuris et at., Cell 75, 791-803 
(1993). Expression of this fusion is under control of the galactose inducible promoter GALL Transformation reactions 
were plated onto complete minimal media lacking uracil, histidine and tryptophan. Approximately 4.5 x 10 6 individual 
transformants were obtained, pooled and frozen. To ensure that each primary colony was replated during the selection 
procedure, 2 x 1 0 7 viable cells (approximately 3 times the number of individual transformants) were plated onto minimal 
media lacking uracil., histidine, tryptophan and leucine with galactose/raffinose as the carbon source to induce expres- 
sion of AD fusions. Colonies arising after 3 and 4 days of growth at 30 °C were picked to complete minimal media 
lacking uracil, histidine and tryptophan. Colonies containing potential interacting fusion proteins were then tested for 
galactose dependence and LacZ expression. Those isolates which activated both the LEU2 and LacZ reporters in a 
galactose dependent fashion were considered positive and pursued further. Plasmids were isolated from yeast, used 
to transform E. co//strain KC8, and AD fusion plasmids selected by growth on minimal E. cofi media lacking tryptophan. 
Each AD fusion plasmid containing a potential interacting fusion was used to transform CC Y321 . Several transformants 
^ere^bjected-te^creening^org^lactc^e^ — 
specific, the ability of each AD fusion plasmid to interact with 22 nonrelated LexA fusion proteins was tested. AD fusion 
plasmids which passed this second round of screening and interacted specifically with the LexA-PS-1 fusion were 
identified. 
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Example 2 - PSP1 cDNA Cloning and Sequ nc Analysis 

Th AD fusion plasmids w re subjected to restriction digest analysis and sequencing as indicated above. Sequence 
analysis of one of the interacting fusion protein cDNAs revealed a 519 nucleotide open reading frame (SEQ ID NO: 1 ) 
encoding a 173 amino acid (SEQ ID NO: 2) protein starting with an GGA at posrtion 2 and terminating with a TGA al 
position 523 of SEQ ID NO: 1 . GenBank searches using the BLASTX and BLASTN algorithms with the cDNA sequence 
or with the deduced amino acid sequence indicated homology to a portion of the E. coli serine protease htrA described 
by Lipinska et al., supra, (SEQ ID NOs: 13 and 14). This novel cDNA was designated PSP1. 

To obtain a greater portion of the cDNA, the oligonucleotide, 5'-CTGGATGGGGAGGTGATTGGAGTG-3' (SEQ ID 
NO: 15) representing bp 83-106 of SEQ ID NO: 1 , was used to screen a Superscript human brain cDNA library (Gibco 
BRL) using the Genetrapper cDNA positive selection system (Gibco BRL). Colonies were screened using whole cell 
PGR or standard hybridization conditions as described by Innis era/., PCR Protocols: A Guide to Methods and Appli- 
cations, Academic Press, San Diego, CA (1990) and Sambrook etai, Molecular Cloning: A Laboratory Manual, 2nd 
ed. : Cold Spring Harbor Press, Cold Spring Harbor, NY (1989). Those isolates which contained PSPI were subjected 
to restriction digest analysis and sequencing. The longest clones, SEQ ID NO: 3 and SEQ ID NO: 5 were sequenced 
in their entirety. 

Sequence analysis of SEQ ID NO: 3 revealed a 969 nucleotide open reading frame encoding a 323 amino acid 
(SEQ ID NO: 4) protein starting with a CCC at position 1 and terminating with a TGA at position 972 of SEQ ID NO: 
3. Sequence analysis of SEQ ID NO: 5 revealed a 1500 nucleotide open reading frame encoding a 423 amino acid 
(SEQ ID NO: 6) protein starting with an CTT at position 1 and terminating with a TGA at position 1 272 of SEQ I D NO: 5. 

A second round of screening was performed using the oligonucleotide, S'-GTCTCTGGGCCCCGGTTGTCTGTTG- 
3' (SEQ ID NO: 16) representing bp 5-28 of SEQ ID NO: 5; the library and screening protocol remained unchanged. 
In the second round of screening, the isolate designated SEQ ID NO: 7 contained the longest cDNA clone. Sequence 
analysis of SEQ ID NO: 7 revealed a 1374 nucleotide open reading frame encoding a 458 amino acid (SEQ ID NO: 8) 
protein starting with an ATG at position 251 and terminating with a TGA at position 1627 of SEQ ID NO: 7. However, 
SEQ ID NO: 7 does not have a stop codon upstream from the potential initiation codon. To confirm that the predicted 
start codon is authentic, the 5' nucleotide sequence was extended with 5' RACE using "Marathon Ready" human brain 
cDNA (Clontech) and a nested set of primers. A SEQ ID NO: 7 specific primer S'-CCAACAGACAACCGGGCCCAGA- 
GACT-3' (SEQ ID NO: 20) and a 5* anchor primer-1 (Clontech) was used in the first PCR amplification and a SEQ ID 
NO: 7 specific primer 5'-TGCCTCCTCGCCCGCCCTACTCAGA-3' (SEQ ID NO: 2 1 ) and 5' anchor primer-2 (Clontech) 
was used in the second PCR amplification. PCR products were T/A cloned into pCR2.1 (Invitrogen). Eighteen isolates 
with staggered 5' ends were analyzed and a 5' consensus sequence of 587 nucleotides was generated (SEQ ID NO: 
22). Alignment of SEQ ID NO: 22 and SEQ ID NO:7 to generate a consensus sequence (SEQ ID NO: 23) indicates 
that at nucleotide position 225 there is an in frame stop codon and the first methionine corresponds to that predicted 
in SEQ ID NO: 7. This gene is designated PSP1-2. 

Consensus full length sequences for the genes designated PSP1-1 (SEQ ID NOs: 24 and 25), PSP1-3 (SEQ ID 
NOs: 26 and 27) and PSP1-4 (SEQ ID NOs: 28 and 29) were generated from alignment of the 5' consensus sequence 
(SEQ ID NO: 22), other partial PSP1 clones, and with SEQ ID NOs: 7, 3 and 5, respectively. 

Alignment of the deduced amino acid sequence of PSP1-1 (SEQ ID NO: 25) to E. coli htrA (SEQ ID NO: 14) was 
accomplished using the BESTFIT algorithm (University of Wisconsin Genetics Computer Group). An approximate sim- 
ilarity of. 55% and an identity of 33.5% at the amino acid level was observed and is shown in Fig. 1 (top, PSP1-1; 
bottom, E. coli htrA). The critical histidine and serine motif GXSXG conserved in all serine proteases is present in 
PSP1-1 at amino acid positions 198 and 304-308, respectively, and are indicated in bold. Amino acid numbers are 
indicated at the left and right of the sequence alignment. 

Nucleotide sequence comparison of PSP1-2, PSP1-1, PSP1-3an6 PSP1^4 using the PILEUP and PRETTY algo- 
rithms (University of Wisconsin Genetics Computer Group) with gap creation and extension penalties of 5.0 and 0.3, 
respectively, is shown in Fig. 2. The alignment results indicate that at nucleotide position 1 541 of the alignment, PSP1-2 
and PSP1- 1 contain a 225 bp deletion and PSP1-4 contains a 1 95 bp deletion. Within the same alignment at nucleotide 
position 1942, PSP1-4 lacks 96 bp that are present in PSP1-2, PSP1-1 and PSP1-3. At the junction of each deletion 
site there is a splice site consensus sequence AGG or TGG (indicated in bold), suggesting that these alternate forms 
are due to alternative splicing. See Mount.S. in Nucl Acids Res 10, 458-472 (1982). The apparent splicing event at 
position 1541 results in the removal of a stop codon (underlined in Fig. 2) that is present in PSP1-3. In addition, PSP1-2 
and -PSP1-1 contain a single nucleotide difference at position 672 of the alignment: PSP7-2 contains a T at this posrtion - 
producing the codon TGC which codes for a cysteine while PSP1-1 contains a C at the same position producing the 
codon CGC which codes for a cysteine. 

Nucleotide sequence comparison of PSP1-1 (SEQ ID NO: 24) to the putative human serine protease of Ohno et 
al., supra, (SEQ ID NO: 17) indicated a 49% identity using the GAP algorithm and 65% using the BESTFIT algorithm 
(data not shown). Alignment of the deduced amino acid sequence of PSP1-1 (SEQ ID NO: 25) to the D87258 protease 
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of Ohno et ai, supra, (SEQ ID NO: 18) was accomplished using the BESTFIT algorithm and' is shown in Fig. 3 (top, 
PSP1-1 ; bottom, Ohno etal. D87258 protease). An approximate identity of 46% at the amino acid level was observed. 

Example 3 - Tissue Distribution of PSP1 

Northern analysis was carried out to determine the distribution of PSP1 mRNA in human tissues. A 30-base oli- 
gonucleotide probe directed against the PSP 1 sequence was used (5-ATGCTGAACATCGGGAAAGCUGGTTCTCG- 
3') (SEQ ID NO: 19). This probe was 3'-end labelled with [ 32 P]-dATP. Northern blots containing mRNA from multiple 
human tissues (Clontech #7750-1 , #7760-1 , and #7755-1 ) were hybridized with this probe under stringent conditions. 
A major band of approximately 1 .9kb was detected in all regions investigated: heart, brain, lung, placenta, liver, skeletal 
muscle, kidney, pancreas, amygdala, caudate nucleus, corpus callosum, hippocampus, substantia nigra, subthalamic 
nucleus, thalamus, cerebellum, cerebral cortex, medulla, spinal cord, occipital pole, frontal lobe, temporal pole, and 
putamen. PSP1 mRNA was also detected in Alzheimer's disease brain. 

Example 4 - Detecting the PSP1 polymorphisms 

PSP1 oligonucleotides 1 AFC, 1 AFT and 1 AR were designed for detecting the polymorphism at nucleotide 672 
(cytidine to thymine) causing the Arg to Cys amino acid change. The Allele Specific Oligonucleotides (ASO) 1 AFC and 
1 AFT are identical apart from their 3' end bases and provide the specificity for screening for the polymorphism. 

1 AFC: CAT CCG GCA TTG TTA GCT CTG C 22mer (SEQ ID NO:32) 
1 AFT: CAT CCG GCA TTG TTA GCT CTG T 22mer (SEQ ID NO:33) 
1 AR: CAA TAG CTG CAT CAG TTT GAA TG 23mer (SEQ ID NO:34) 

Pairs of oligonucleotides (1 AFC + 1 AR, or 1 AFT + 1 AR) were used in a PCR under the following conditions: 94°C 
for 40 seconds, 60°C for 30 seconds, for 35 cycles in a reaction containing 1 U KlenTaql (GenPak Ltd.), 50mM Tris- 
Cl pH9.1 ( 16mM ammonium sulphate, 3.5mM MgCI 2 , 150ug ml -1 BSA and 25ng of human genomic DNA of unknown 
source. Each pair of oligonucleotides was tested against 12 random samples of genomic DNA and the products elec- 
trophoresed on a 4% agarose (Gibco-BRL) gel. The expected product of 95 base pairs was seen for both ASOs in 8 
of the 12 DNAs indicating that these individuals are heterozygous for this polymorphism. Two of the DNAs amplified 
with only the 1 AFC oligonucleotide and are thus homozygous for the allele with the cytidine at this position. Two of the 
DNAs amplified with only the 1 AFT oligonucleotide and are thus homozygous for the allele with the thymine at this 
position. 

PSP1 oligonucleotides 1BFC, 1BFT and 1BR were designed for detecting the polymorphism at nucleotide 1435 
(cytidine to thymine) causing the Ala to Val amino acid change. 

1BFC: TGG CGG GCT TTG GGG GGC ATT C 22mer (SEQ ID NO:35) 
IBFT: TGG CGG GCT TTG GGG GGC ATT T 22mer (SEQ ID NO:36) 
1BR: GAC GTC AGC AGG GCC CGG AGG TC 23mer (SEQ ID NO:37) 

Pairs of oligonucleotides (1 BFC + 1 BR, or 1 BFT + 1 BR) were used in a PCR under the following conditions: 94° C 
for 40 seconds, 67°C for 30 seconds, for 35 cycles in a reaction containing 1 U KlenTaql (GenPak Ltd.), 50mM Tris- 
Cl pH9.1, 16mM ammonium sulphate, 3.5mM MgClg, 150ug ml" 1 BSA and 25ng of human genomic DNA of unknown 
source. Each pair of oligonucleotides was tested against 12 random samples of genomic DNA and the products elec- 
trophoresed on a 4% agarose (Gibco-BRL) gel. The expected product of 75 base pairs was seen using the 1 BFT ASO 
in 9 of the 12 samples indicating that the other 3 individuals have a different allele at this position. 



Example 5 - Detecting the D87258 polymorphism 

Oligonucleotides 2AFG, 2AFTand 2AR were designed for detecting the polymorphism at nucleotide 1 325 (guanine 
to thymine) causing the Gly to Val amino acid change. 
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2AFG: GAT ACC CCA GCA GAA GCT GG 20mer (SEQ ID NO:38) 
2 AFT: GAT ACC CCA GCA GAA GCT GT 20mer (SEQ ID NO:39) 
2AR: GCT GAC ATC ATT GGC GGA GAC 21mer (SEQ ID NO:40) 

Pairs of oligonucleotides (2AFG + 2AR, or 2 AFT + 2AR) were used in a PCR under the following conditions: 94°C 
for 40 seconds, 62°C for 30 seconds, for 35 cycles in a reaction containing 1 U KlenTaql (GenPak Ltd.), 50mM Tris- 
ClpH9.1, 1 6mM ammonium sulphate, 3.5mM MgCfe, 150ugmM BS A and 25ng of human genomic DNAof unknown 
source. Each pair of oligonucleotides was tested against 12 random samples of genomic DNA and the products elec- 
trophoresed on a 4% agarose (Gibco-BRL) gel. The 2AFT ASO generated a band of approximately 1000 bp. The 
predicted band was 90 bp. Presumably the presence of the larger bands was due to the presence of an intron in the 
region flanked by oligonucletides 2AR and 2 AFT Bands were observed in all of the samples amplified with 2 AFT 
indicating that the allele containing the thymine is present rn all 12 individuals. 

The present invention may be embodied in other specific forms without departing from the spirit or essential at- 
tributes thereof, and, accordingly, reference should be made to the appended claims, rather than to the foregoing 
specification, as indicating the scope of the invention. 
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SEQUENCE LISTING 

(1) G EfC ERA L INFORMATION 

(i) APPLICANT : Creasy, Caretha 
Livi, George 
Karran, Eric 
Ci i nkenbeard, Helen 
Browne* Michael 
Southan, Chris-opher 

(iij TITT.E OF THE INVENTION: HUMAN SERINE PROTEASE 

{iil} NUMBER OF SEQUENCES: 4C 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: SmithKline Beecham Corporacion 

(B) STREET: 7 09 Swedeland Road 

(C) CITY: King of Prussia 
(D; STATE: PA 

(E) COUNTRY: USA 
<F) ZIP: 19406 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ Version 1.5 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii> PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 60/025436 
<B) FILING DATE: 06-SEPT-1996 

(viii) ATTORNEY/AGENT INFORMATION: 
iA) NAME: Baumeister, Kirk 

(B) REGISTRATION NUMBER: 33,833 

iC) REFERENCE/DOCKET NUMBER: P50547F2 

(ix> TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: 610-270-5096 

IB) TELEFAX: 610-270-5090 

IC) TELEX; 



(2) INFORMATION FOR SEQ ID NO:l: 
(i; SEQUENCE CHARACTERISTICS: 

{A} LENGTH: 732 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii} MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) ~ FRAGMENT "TYPE : 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
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GGGACTCCCC CAAACCAATG TGGAATACAT TCAAACTGAT GCAGCTATTG ATTTTGGAAA 6C 

CTCTGGAGGT CCCCTGGTTA ACCTGGATGG GGAGGTGATT GGAGTGAACA CCATGAAGGT 12C 

CACAGCTGGA ATCTCCTTTG CCATCCCTTC TGATCGTCTT CGAGAGTTTC TGCATCGTGG ISC 

GGAAAAGAAG AATTCCTCCT CCGGAATCAG TGGGTCCCAC CGGCGCTACA TTGGGGTGAT 24 0 

5 GATGCTGACC CTGAGTCCCA GCATCCTTGC TG AA CT AC AG CTTCGAGAAC CAAGCTTTCC 3 00 

CGATGTTCAG CATGGTGTAC TCATCCATAA AGTCATCCTG GGCTCCCCTG CACACCGGGC 3 60 

TGGTCTGCGG CCTGGTGATG TGATTTTGGC CATTGGGGAG CAGATGGTAC AAAATGCTGA 420 

AGATGTTTAT GAAGCTGTTC GAACCCAATC CCAGTTGGCA GTGCAGATCC GGCGGGGACG 480 

AGAAACACTG ACC'i'TATATG TGACCCCTGA GGTCACAGAA TGAATAGATC ACCAAGAGTA 540 

TGAGGCTCCT GCTCTGATTT CCTCCTTGCC TTTCTGGCTG AGGTT CTGAG GGCACCGAGA 600 

10 CAGAGGGTTA AATGAACCAG TGGGGGCAGG TCCCTCCAAC CACCAGCACT GACTCCTGGG 660 

CTCTGAAGAA TCACAGAAAC ACT T TTT AT A TAAAATAAAA TTATACCTAG CAACAAAAAA 720 

AAAAAAAAAA AA 732 

(2) INFORMATION FOR SBQ ID NO: 2: 

rS (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 173 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : single 
(D} TOPOLOGY: linear 

20 (ii) MOLECULE TYPE: peptide 

<iii) HYPOTHETICAL: NO 
(iv) ANTI SENSE: NO 
(vj FRAGMENT TYPE: N- terminal 
(vi) ORIGINAL SOURCE: 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

Gly Leu Pro Gin Thr Asn Val Glu Tyr lie Gin Thr Acp Ala Ala lie 

1 5 10 15 

Asp Phe Gly Asn Ser Gly Gly Pro Leu Val Asn Leu A3p Gly Glu Val 
20 25 . 30 

30 lie Gly Val Aan Thr Met Lys Val Thr Ala Gly lie ser Phe Ala He 

35 40 45 

Pro Ser Asp Arg Leu Arg Glu Phe Leu His Arg Gly Glu Lys Lys Asn 

50 55 60 

Ser Ser Ser Gly lie Ser Gly ser Gin Arg Arg Tyr He Gly val Met 
65 70 75 80 

35 

Met Leu Thr Leu Ser Pro Ser He Leu Ala Glu Leu Gin Leu Arg Gl u 

85 90 95 

Pro Ser Phe Pro Asp Val Gin His Gly val Leu He His Lys Val He 

100 105 110 

Leu Gly Ser Pro Ala His Arg Ala Gly Leu Arg Pro Gly Asp Val He 

115 120 125 

Leu Ala He Gly Glu Gin Met Val Gin Asn Ala Glu Asp Val Tyr Glu 

130 135 140 

Ala Val Arg Thr Gin Ser Gin Leu Ala Val Gin He Arg Arg Gly Arg 
145 150 155 160 

Glu Thr Leu Thr Leu Tyr Val Thr Pro Glu val Thr Glu 
165 170 



45 



(2) INFORMATION FOR SEQ ID NO: 3: 



Ci) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1787 base pairs 
r50 (B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 
(Dj TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NO 
SS (iv} ANTI SENSE: NO 

(v) FRAGMENT TYPE: 
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(vi) ORIGINAL SOURCE: 



<xi) SEQUENCE DESCRIPTION: SRC ID NO: 3: 



10 



IS 



25 



CCCAGTCTCT 
GGGACCCCGG 
GCCTCAGAGA 
GGGGCAGTGC 
GTCCCTAGCC 
GAGAAGACAG 
CCCGAGGTCC 
ACCAACGCCC 
ACGTATGAGG 
CAGACTAAGG 
GAGTTTGTTG 
GTTAGCTCTG 
ATTCAAACTG 
AGTGAGACAT 
CCCCTAATTC 
TATCCAACCA 
CTATTTGTTT 
GTGAACACCA 
GAGTTTCTGC 
CGCTACATTG 
CGAGAACCAA 
TCCCCTGCAC 
ATGGTACAAA 
CAGATCCGGC 
ATAGATCACC 
TTCTGACGGC 
CAGCACTGAC 
TACCTAGCAA 
AAAGGCTAGA 
CTGACCTCCT 



CCGCCCGGTT 
GTCCCCGGGC 
ACTCTGGAAC 
TGTTGTTGTT 
CGCCGCCCGC 
CACCTGCCGT 
CTAT CTCGAA 
ATGTGGTGGC 
CCGTGGTCAC 
AGCCTCTCCC 
TTGCCATGGG 
CTCAGCGTCC 
ATGCAGCTAT 
CCTTCCTTCC 
AAGGATGTTT 
GATCTCCCCA 
AGGCTAGGGA 
TGAAGGTCAC 
AT CGTGGGG A 
GGGTGATGAT 
GCTTTCCCGA 
ACCGGGCTGG 
ATGCTGAAGA 
GGGGACGAGA 
AAGAGTATGA 
ACCGAGACAG 
TCCTGGGCTC 
CATATTATAG 
GGTAAAGCTG 
ATTAAAGAAA 



GTCTGTTGGG 
ACAACTGACT 
CCGTTCGCGC 
GTGGGGCGGG 
TTCTCCCCGG 
GGTCTATATC 
CGGCT CAGG A 
TGATCGGCGC 
AGCTGTGGAT 
CACGCTGCCT 
AAGT CCCTTT 
AGCCAGAGAC 
TGATTTTGGA 
AAGAATCCCT 
GGT CAAGTTT 
ACACTTGCTG 
ACTGGGGGCT 
AGCTGG A AT C 
AAAGAAGAAT 
GCTGACCCTG 
TGTT CAGCAT 
TCTGCGGCCT 
TGTTTATGAA 
AACACTGACC 
GGCTCCTGCT 
AGGGTTAAAT 
TGAAGAATCA 
TAAAAAATGA 
TATCCCCCTA 
ATGAGCTGCT 



GTCACTGAAC 

GCGGTGACCC 

GCGTGGCTGG 

GGTCGGGGTC 

AGTCAGTACA 

GAGATCCTGG 

TTCGTGGTGG 

AGAGTCCGTG 

CCCGTGGCAG 

CTGGGACGCT 

GCACTGCAGA 

CTGGGACTCC 

AACT CTGG AG 

GCCCCAGGTC 

CTGAGCAGTT 

GTACTTTTGT 

GTATCCCTGC 

TCCTTTGCCA 

TCCTCCTCCG 

AGTCCCAGCA 

GGTGTACTCA 

GGTGATGTGA 

GCTGTTCGAA 

TTATATGTGA 

CTGATTTCCT 

GAACCAGTGG 

CAGAAACACT 

GGTGGGAGGG 

AACTTAGGGG 

GAAAAAAAAA 



3S 



<2) INFORMATION FOR SEQ ID NO;4: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH j 32 3 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 



CCCGAGCATG 

CAGATACCAG 

CGGTGGCGCT 

CTCCGGCCGT 

ACTTCATCGC 

ACCGGCACCC 

CTGCCGATGG 

TGAGACTGCT 

ACATCGCAAC 

CAGCTGATGT 

ACACGATCAC 

CCCAAACCAA 

GTCCCCTGGT 

AGTGTGGGAA 

CTTTGTTGGC 

TCGGGTGCCC 

AGGATGGGGA 

TCCCTTCTGA 

GAATCAGTGG 

TCCTTGCTGA 

TCCATAAAGT 

TTTTGGCCAT 

CCCAATCCCA 

CCCCTGAGGT 

CCTTGCCTTT 

GGGCAGGTCC 

TTTTATATAA 

CTGGATCTTT 

AGATACTGGA 

AAAAAAA 



CCTGACGTCT 
GACCCGGGAG 
GGGCGCTGGG 
CCTCGCCGCC 
AGATGTGGTG 
TTTCTTGGGC 
GCrCATTGTC 
A AGCGGCG A C 
GCTGAGGATT 
CCGGCAAGGG 
ATCCGGCATT 
TGTGGAATAC 
TAACCTGGTG 
GGGTAGGTTT 
TATCTCTCAA 
CCATCCCCTA 
GGTGATTGGA 
TCGTCTTCGA 
GTCCCAGCGG 
ACTACAGCTT 
CATCCTGGGC 
TGGGGAGCAG 
GTTGGCAGTG 
CACAGAATGA 
CTGGCTGAGG 
CTCCAACCAC 
AATAAAATTA 
TCCCCCACCA 
GCTGACCATC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1787 



40 



(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE: NO 

(v) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE; 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 





Pro 


Ser 


Leu 


Trp 


Ala 


Arg 


Leu 


Ser 


val 


Gly 


Val 


Thr 


Glu 


Pro 


Arg 


Ala 




1 








5 










10 










15 




45 


Cys 


Leu 


Thr 


Ser 


Gly 


Thr 


Pro 


Gly 


Pro 


Arg 


Ala 


Gin 


Leu 


Thr 


Ala 


Val 










20 










25 










30 








Thr 


Pro 


Asp 


Thr 


Arg 


Thr 


Arg 


Glu 


Ala 


Ser 


Glu 


Asn 


Ser 


Gly 


Thr 


Arg 








35 










40 










45 










Ser 


Arg 


Ala 


Trp 


Leu 


Ala 


Val 


Ala 


Leu 


Gly 


Ala 


Gly 


Gly 


Ala 


Val 


Leu 






50 










55 










60 










SO 


Leu 


Leu 


Leu 


Trp 


Gly 


Gly 


Gly 


Arg 


Gly 


Pro 


Pro 


Ala 


val 


Leu 


Ala 


Ala 




65 










70 










75 










80 




val 


Pro 


Ser 


Pro 


Pro 


Pro 


Ala 


Ser 


Pro 


Arg 


Ser 


Gin 


Tyr 


Asn 


Phe 


He 












85 










90 










95 






Ala- 


"Asp 


val 


Val 


Glu 


Lys 


Thr 


Ala 


Pro 


Ala 


Val 


val 


Tyr 


lie 


Glu 


lie 










100 










105 










no 






SS 


Leu 


Asp 


Arg 


His 


Pro 


Phe 


Leu 


Gly 


Arg 


Glu 


Val 


Pro 


lie 


Ser 


Asn 


GJy 



18 



EP 0 828 003 A2 



115 120 125 

Ser Gly Phe Val Val Ala Ala Asp Gly Leu lie Val Thr Asn Ala His 

130 135 140 

Val Val Ala Asp Arg Arg Arg vol Arg Val Arg Leu Leu Ser Gly Asp 
5 145 150 155 160 

Thr Tyr Glu Ala Val Val Thr Ala val Asp Pro Val Ala A3p lie Ala 

165 170 175 

Thr Leu Arg lie Gin Thr Lys Glu Pro Leu Pro Thr Leu Pro Leu Gly 

180 185 190 

Arg Ser Ala Asp Val Arg Gin Gly Glu Phe Val Val Ala Met Gly Ser 
10 195 200 205 

Pro Phe Ala Leu Gin Asn vhr lie Thr ser Gly lie Val Ser Ser Ala 

210 215 220 

Gin Arg Pro Ala Arg Asp Leu Gly Leu Pro Gin Thr Asn Val Glu Tyr 
225 230 235 240 

lie Gin Thr Asp Ala Ala tie Asp Phe Gly Asn Ser Gly Gly Pro Leu 
15 245 250 25b 

Val Asn Leu Val Ser Glu Thr Ser Phe Leu Pro Arg lie Pro Ala Pro 

260 265 270 

Gly Gin Cys Gly Lys Gly Arg Phe Pro Leu lie Gin Gly Cys Leu Val 

275 280 285 

Lys Phe Leu Ser Ser Ser Leu Leu Ala lie Ser Gin Tyr Pro Thr Arg 
20 290 295 300 

Ser Pro Gin His Leu Leu Val Leu Leu phe Gly Cys Pro His Pro Leu 
305 310 315 320 

Leu Phe val 

25 (2) INFORMATION FOR SEQ ID NO : 5 : 

<i) SEQUENCE CHARACTERISTICS i 

(A) LENGTH; 1503 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
30 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NO 
fiv) ANTISENSE; NO 

(v) FRAGMENT TYPE: 
35 (vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

CTTCGGGCAT GGCGGGCTTT GGGGGGCATT CGCTGGGGGA GGAGACCCCG TTTGACCCCT 60 

GACCTCCGGG CCCTGCTGAC GTCAGGAACT TCTGACCCCC GGGCCCGAGT GACTTATGGG 120 

40 ACCCCCAGTC TCTGGGCCCG GTTGTCTGTT GGGGTCACTG AACCCCGAGC ATGCCTGACG 180 

TCTGGGACCC CGGGTCCCCG GGC A CAACTG ACTGCGGTGA CCCCAGATAC CAGGACCCGG 240 

GAGGCCTCAG AGAACTCTGG AACCCGTTCG CGCGCGTGGC TGGCGGTGGC GCTGGGCGCT 3 00 

GGGGGGGCAG TGCTGTTGTT GTTGTGGGGC GGGGGTCGGG GTCCTCCGGC CGTCCTCGCC 360 

GCCGTCCCTA GCCCGCCGCC CGCTTCTCCC CGGAGTCAGT A CAACT T CAT CGGAGATGTG 42 0 

GTGGAGAAGA CAGCACCTGC CGTGGTCTAT ATCGAGATCC TGGACCGGCA CCCTTTCTTG 480 

GGCCGCGAGG TCCCTATCTC GAACGGCTCA GGATTCGTGG TGGCTGCCGA TGGGCTCATT 540 

GTCACCAACG CCCATGTGGT GGCTGATCGG CGCAGAGTCC GTGTGAGACT GCTAAGCGGC 600 

GACACGTATG AGGCCGTGGT CAGAGCTGTG GATCCCGTGG CAGACATCGC AACGCTGAGG 660 

ATTCAGACTA AGGAGCCTCT CCCCACGCTG CCTCTGGGAC GCTCAGCTGA TGTCCGGCAA 72 0 

GGGGAGTTTG TTGTTGCCAT GGGAAGTCCC TTTGCACTGC AGAACACGAT CACATCCGGC 780 

so ATTGTTAGCT CTGCTCAGCG TCCAGCCAGA G A CCTGGG AC TCCCCCAAAC CAATGTGGAA 840 

TACATTCAAA CTGATGCAGC TATTGATTTT GGAAACTCTG GAGGTCCCCT GGTTAACCTG 9 00 

GCTAGGGAAC TGGGGGCTGT ATCCCTGCAG GATGGGGAGG TGATTGGAGT GAACACCATG 960 

AAGGT CA C AG CTGGAATCTC CTTTGCCATC CCTTCTGATC GTCTTCGAGA GTTTCTCCAT 1020 

CGTGGGGAAA AGAAGAATTC CTCCTCCGGA ATGAGTGGGT GGGAGGGGGG CTACATTGGG 1080 

GTGATGATGC TGACCCTGAG TCCCAGGGCT GGTCTGCGGC CTGGTGATGT GATTTTGGCC 1140 

55 ATTGGGGAGC AGATGGTACA AAATGCTGAA GATGTTTATG AAGCTGTTCG AACCCAATCC 12 00 

CAGTTGGCAG TGCAGATCCG GCGGGGACGA GAAACACTGA CCTTATATGT GACCCCTGAG 1260 
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20 



25 



30 



35 



GTCACAGAAT GAATAGATCA CCAAGAGTAT GAGGCTCCTG CTCTGATTTC CTCCTTGCCT 132 0 

TTCTGGCTGA GGTTCTGAGG GCACCGAGAC AGAGGGTTAA ATGAACCAGT GGGGGCAGGT 1380 

CCCTCCAACC ACCAGCACTG ACTCCTGGGC TCTGAAGAAT CACAGAAACA CTTTTTATAT 1440 

AAAATAAAAT TATACCTAGC AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 1500 

AAA 1503 

(2) INFORMATION FOR SEQ ID NO: 6; 

(i) SEQUENCE CHARACTERISTICS: 
(Ai LENGTH: 423 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
(B) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ill) HYPOTHETICAL i NO 

(iv) ANTISEWSE: WO 

(v) FRAGMENT TYPE: N- terminal 
(vi> ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



Leu Ara Ala Trp Arg Ala keu Gly Gly He Arg Trp Gly Ary Arg Pro 

15 10 15 

Arg Leu Thr Pro Asp Leu Arg Ala Leu Leu Thr Ser Gly Thr Ser Asp 

20 25 30 

Pro Arg Ala Arg val Thr Tyr Gly Thr Pro Ser Leu Trp Ala Arg Leu 

35 40 45 

Ser Val Gly Val Thr Glu Pro Arg Ala Cys Leu Thr Ser Gly Thr Pro 

50 bb 60 

Gly Pro Arg Ala Gin Leu Thr Ala Val Thr Pro Asp Thr Arg Thr Arg 
65 70 . 75 AO 

Glu Ala Ser Glu Asn Ser Gly Thr Arg Ser Arg Ala Trp Leu Ala Val 

85 90 95 

Ala Leu Gly Ala Gly Gly Ala val Leu Leu Leu Leu Trp Gly Gly Gly 

100 105 110 

Arg Gly Pro Pro Ala Val r.eu Ala Ala Val Pro Ser Pro Pro Pro Ala 

115 120 125 

Ser Pro Arg Scr Gin Tyr Asn Phe He Ala Asp Val Val Glu Lys Thr 

130 135 140 

Ala Pro Ala Val Va] Tyr He Glu He Leu Asp Arg His Pro Phe Leu 
145 150 155 160 

Gly Arg Glu Val Pro He Ser Asn Gly Ser Gly Phe Val Val Ala Ala 

165 170 175 

Asp Gly Leu He Val Thr Asn Ala His Val Val Ala Asp Arg Arg Arg 
40 180 185 190 

val Arg Val Arg Leu Leu Ser Gly Asp Thr Tyr Glu Ala val Val Thr 

195 200 205 

Ala Val Asp Pro Val Ala Asp He Ala Thr Leu Arg He Glr. Thr Lys 

210 215 220 

Glu Pro Leu Pro Thr Leu Pro Leu Gly Arg Ser Ala Asp Val Arg Gin 
45 225 230 235 240 

Gly Glu Phe Val Val Ala Met Gly Ser Pro Phe Ala Leu Gin Asn Thr 

245 250 255 

He Thr Ser Gly He Val Ser Ser Ala Gin Arg Pro Ala Arg Asp Leu 

260 265 270 

Gly Leu Pro Gin Thr Asn Val Glu Tyr He Gin Thr Asp Ala Ala He 
50 275 280 285 

Asp Phe Gly ash Ser Gly Gly Pro Leu val Asn Leu Ala Arg Glu Leu 

290 295 300 

Gly Ala Val Ser Leu Gin Asp Gly Glu Val He Gly Val Asn Thr Met 
305 310 " ' 315 " - 320 

Lys Val Thr Ala Gly He Ser Phe Ala He Pro Ser Asp Arg Leu Arg 
S5 325 330 335 

Glu Phe Leu His Arg Gly Glu Lys Lys Asn Ser Ser Ser Gly He Ser 
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340 










345 










350 






Gly 


Ser 


Gin 


Arg 


Arg 


Tyr 


He 


Gly 


val 


Met 


Met 


Leu 


Thr 


Leu 


Ser 


Pro 






355 








360 










365 








Arg 


Ala 


Gly 


lieu 


Arg 


Pro 


Gly 


Asp 


val 


He 


Leu 


Ala 


He 


Gly 


Glu 


Gin 




370 










375 










380 










Met 


Val 


Gin 


Asn 


Ala 


Glu 


Asp 


val 


Tyr 


Gin 


Ala 


val 


Arg 


Thr 


Gin 


Ser 


385 










390 










395 










400 


Gin 


heu 


Al a 


Val 


filr. 


He 


Arg 


Arg 


Gly 


Arg 


Glu 


Thr 


Leu 


Thr 


Leu 


Tyr 










405 










410 










415 




Val 


Thr 


Pro 


Glu 


val 


Thr 


Glu 





















420 



(2) INFORMATION FOR GEO ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1835 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOCY: linear 



(ii) MOLECULE TYPE: CDMA 

(iii) HYPOTHETICAL: NO 
(ivj ANTI SENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
(ix) FEATURE; 



(A) name /KEY : Coding Sequence 
{ B ) LOCATION: 251... 1624 
(D) OTHER INFORMATION: 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

GGCCGGAAGG GCTAGCGGTC CCAGCATACC CCGCGGCCCC TTGGGCCGTC TCACAACTCG 
CGTCCGGCGG AGACCACAAT TCCCGGCATT CGTGGGGCAT GGAGGAGTCG GCCTCCCGGA 
ATCCTGGTCC CGGCGTGCAC TTCTGAAGGA CTTCAGGTAC CGGCGTGCCC CGCGTCCTAC 
TGTCCGCCTG CTCGCGTCCT GGGTGCCGCC TCTGAGTAGG GCGGGCGAGG AGGCAGCCAA 
GGCGGAGCTG ATG GCT GCC CCG AGG GCG GGG CGG GGT GCA GGC TGG AGC 
Met Ala Ala Pro Arg Ala Gly Arg Gly Ala Gly Trp Ser 
15 10 
CTT CGG GCA TGG CGG GCT TTG GGG GGC ATT TGC TGG GGG AGG AGA CCC 
Leu Arg Ala Trp Arg Ala Leu Gly Gly He Cys Trp Gly Arg Arg Pro 
15 20 25 

CGT TTG ACC CCT GAC CTC CGG GCC CTG CTG ACG TCA GGA ACT TCT GAC 
Arg Leu Thr Pro Asp Leu Arg Ala Leu Leu Thr Ser Gly Thr Ser Asp 
30 35 40 45 



CCC CGG GCC CGA GTG ACT TAT GGG ACC CCC AGT CTC TGG GCC CGG TTG 
Pro Arg Ala Arg val Thr Tyr Gly Thr Pro Ser Leu Trp Ala Arg Leu 
50 55 60 



TCT GTT GGG GTC ACT GAA CCC CGA GCA TGC CTG ACG TCT GGG ACC CCG 

Ser Val Gly val Thr Glu Pro Arg Ala Cys Leu Thr Ser Gly Thr Pro 

65 70 75 

GGT CCC CGG GCA CAA CTG ACT GCG GTG ACC CCA GAT ACC AGG ACC CGG 

Gly Pro Arg Ala Gin Leu Thr Ala Val Thr Pro Asp Thr Arg Thr Arg 
80 85 90 

GAG GCC TCA GAG A AC TCT GGA ACC CGT TCG CGC GCG TGG CTG GCG GTG 

Glu Ala Ser Glu Asn Ser Gly Thr Arg Ser Arg Ala Trp Leu Ala val 
95 100 105 
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GCG CTG GGC GOT GGG GGG GCA GTG CTG TTG TTG TTG TGG GGC GGG GGT 625 
Ala Leu Gly Ala Gly Gly Ala val Leu Leu Leu Leu Trp Gly Gly Gly 
110 115 120 125 

5 CGG GGT CCT CCG GCC CTC CTC GCC GCC GTC CCT AGC CCG CCG CCC GCT 673 

Arg Gly Pro Pro Ala val Leu Ala Ala val Pro ser Pro Pro Pro Ala 
130 135 140 

TCT CCC CGG AGT CAG TAC AAC TTC ATC GCA GAT GTG GTG GAG AAG ACA 721 
Ser Pro Arg Ser Gin Tyr Asn Phe He Ala Asp val val Glu Lys Thr 
10 145 150 155 

GCA CCT GCC GTG GTC TAT ATC GAG ATC CTG GAC CGG CAC CCT TTC TTG 769 
Ala Pro Ala Val Val Tyr lie Glu lie Leu Asp Arc His Pro Phe Leu 
160 165 170 

75 GGC CGC GAG GTC CCT ATC TCG AAC GGC TCA GGA TTC GTG GTG GCT GCC 817 

Gly Arg Glu Val Pro He Ser Asn Gly 3er Gly Phe Val Val Ala Ala 
175 180 165 

GAT GGG CTC ATT GTC ACC AAC GCC CAT GTG GTG GCT GAT CGG CGC AG A 865 
Asp Gly Leu He Val Thr Asn Ala His Val Val Ala Asp Arg Arg Arg 
20 190 195 200 205 

GTC CGT GTG AGA CTG CTA AGC GGC GAC ACG TAT GAG GCC GTG GTC ACA 913 
val Arg val Arg Leu Leu ser Gly asp Thr Tyr Glu Ala val Val Thr 
210 215 220 

25 GCT GTG GAT CCC GTG GCA GAC ATC GCA ACG CTG AGG ATT CAG ACT AAG 9 61 

Ala Val Asp Pro Val Ala Asp He Ala Thr Leu Arg He Gin Thr Lys 
225 230 235 

GAG CCT CTC CCC ACG CTG CCT CTG GGA CGC TCA GCT GAT GTC CGG CAA 10 09 
Glu Pro Leu Pro Thr Leu Pro Leu Gly Arg Ser Ala Asp Val Arg Gin 
30 240 245 250 

GGG GAG TTT GTT GTT GCC ATG GGA AGT CCC TTT GCA CTG CAG AAC ACG 1057 
Gly Glu Fhe Val Val Ala Met Gly Ser Pro Phe Ala Leu Gin Asn Thr 
255 260 265 

3S ATC ACA TCC GGC ATT GTT AGC TCT GCT CAG CGT CCA GCC AGA GAC CTG 1105 

He Thr Ser Gly He Val Ser Ser Ala Gin Arg Pro Ala Arg Asp Leu 
270 27b 280 285 

GGA CTC CCC CAA ACC AAT GTG GAA TAC ATT CAA ACT GAT GCA GCT ATT 115 3 

Gly Leu Pro Gin Thr Asn Val Glu Tyr He Gin Thr Asp Ala Ala He 
40 290 295 300 

GAT TTT GGA AAC TCT GGA GGT CCC CTG GTT AAC CTG GAT GGC GAG GTG 1201 
Asp Phe Gly Asn Ser Gly Gly Pro Leu val Asn Leu Asp Gly Glu Val 
305 . 310 . 315 - - 

45 ATT GGA GTG AAC ACC ATG AAG GTC ACA GCT GGA ATC TCC TTT GCC ATC 1249 

He Gly Val Asn Thr Met Lys val Thr Ala Gly He Ser Phe Ala He 
320 325 330 

CCT TCT GAT CGT CTT CGA GAG TTT CTG CAT CGT GGG GAA AAG AAG AAT 1297 
Pro Ser Asp Arg Leu Arg Glu Phe Leu His Arg Gly Glu Lys Lys Asn 
50 335 340 345 

TCC TCC TCC GGA ATC AGT GGG TCC CAG CGG CGC TAC ATT GGG GTG ATG 134 5 
Ser Ser Ser Gly He Ser Gly Ser Gin Arq Arg Tyr He Gly val Met _ 
350 355 360 365 



56 ATG CTG ACC CTG AGT CCC AGC ATC CTT GCT GAA CTA CAG CTT CGA GAA 
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Met Leu Thr Leu Ser pro Ser Tie Leu Ala Glu Leu Gin Leu Arg Glu 
370 375 380 

CCA AGC TTT CCC GAT GTT CAG CAT GGT GTA CTC ATC CAT AAA GTC ATC 1441 
Pro Ser phe Fro Asp val Gin His Gly Val Leu lie Hie Lys Val Tie 
385 390 395 

CTG GGC TCC CCT GCA CAC CGG GCT CCT CTC CCC CCT GGT GAT GTG ATT 1489 
Leu Gly Ser Pro Ala His Arg Ala Gly Leu Arg Pro Gly Asp Val He 
4GC 405 410 

TTG GCC ATT GGG GAG CAG ATG GTA CAA AAT GCT GAA GAT GTT TAT GAA 1537 
Leu Ala lie Gly Glu Gin Met val Gin Asn Ala Glu Asp val Tyr Glu 
415 420 425 

GCT GTT CGA ACC CAA TCC CAG TTG GCA GTG CAG ATC CGG CGG GGA CGA 1585 
Ala Val Arg Thr Gin Ser Gin Leu Ala Val Gin rie Arg Arg Gly Arg 
430 435 440 445 

GAA ACA CTG ACC TTA TAT GTG ACC CCT GAG GTC ACA GAA TGAATAGATC ACC 1637 
Glu Thr Leu Thr Leu Tyr Val Thr Pro Glu Val Thr Glu 
450 455 

AAGAGTATGA GGCTCCTGCT CTGATTTCCT CCTTGCCTTT CTGGCTGAGG TTCTGAGGGC 1697 

ACCGAGACAG AGGGTTAAAT G A A CCAGTCG GCCCACGTCC CTCCAACCAC CAGCACTGAC 1757 

TCCTGGGCTC TGAAGAATCA C AG AAA C ACT TTTTATATAA AATAAAATTA TACCTAGCAA 1817 

CATAAAAAAA AAAAAAAA 1835 

(2) INFORMATION FOR SEQ ID NO : 8 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 458 amino acids 

(B) TYPE: amino acid 

30 <C> STRANDEDNESS : single 

(D) TOFOLOGY: linear 

(ii) MOLECULE TYPE: protein 
Uii) HYPOTHETICAL: NO 

<iv) ANT I SENSE : NO 
3S <v) FRAGMENT TYPE: internal 

(Vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 



Met 


Ala 


Ala 


Pre 


Arg 


Ala 


Gly 


Arg 


Gly 


Ala 


Gly 


Trp 


Ser 


Leu 


Arg 


Ala 


1 








5 










10 










15 




Trp 


Arg 


Ala 


Leu 


Gly 


Gly 


He 


Cys 


Trp 


Gly Arg 


Arg 


Pro 


Arg 


Leu 


Thr 








20 










25 










30 






Pro 


Aop 


Leu 


Arg 


Ala 


Leu 


Leu 


Thr 


Ser 


Gly 


Thr 


Ser 


Asp 


Pro 


Arg 


Ala 






35 










40 










45 








Arg 


Val 


Thr 


Tyr 


Gly 


Thr 


Pro 


Ser 


Leu 


Trp 


Ala 


Arg 


Leu 


Ser 


val 


Gly 




50 










55 










60 










Val 


Thr 


Glu 


Pre 


Arg 


Ala 


Cys 


Leu 


Thr 


Ser 


Gly 


Thr 


Pro 


Gly 


Pro 


Arg 


65 










70 










75 










80 


Ala 


Gin 


Leu 


Thr 


Ala 


Val 


Thr 


Pro 


Asp 


Thr 


Arg 


Thr 


Arg 


Glu 


Ala 


Ser 










85 










90 










95 




Glu 


Asn 


Ser 


Gly 


Thr 


Arg 


Ser 


Arg 


Ala 


Trp 


Leu 


Ala 


Val 


Ala 


Leu 


Gly 








100 










105 










110 






Ala 


Gly 


Gly 


Ala 


Val 


Leu 


Leu 


Leu 


Leu 


Trp 


Gly 


Gly Gly Ary 


Gly 


Pro 






115 










120 










125 








Pro 


Ala 


Val 


Leu 


Ala 


Ala 


val 


Pro 


Ser 


Pro 


Pro 


Pro 


Ala 


Ser 


Pro 


Arg 
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135 










140 










Ser 


Gin 


Tyr 


Asn 


Phe 


He 


Ala 


Asp 


Val 


Val 


Glu 


-ys 


Thr 


Ala 


Pro 


Ala 


145 










150 










155 










160 


val 


val 


Tyr 


He 


Glu 


He 


Leu 


ASp 


Arg 


His 


Pro 


Phe 


Leu 


Gly 


Arg 


Glu 
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s 



10 



18 



20 



25 



30 











165 










170 










175 




val 


Pro 


lie 


Ser 


Asn 


Gly 


Ser 


Gly 


Phe 


val 


Val 


Ala 


Ala 


Asp 


Gly 


Leu 








180 










185 










190 






lie 


Val 


Thr 


Asn 


Ala 


His 


Val 


Val 


Ala 


Asp 


Arg 


Arg 


Arg 


Val 


Arg 


Val 






195 










200 










205 








Arg 


Leu 


Leu 


Ser 


Gly 


Asp 


Thr 


Tyr 


Glu 


Ala 


Val 


Val 


Thr 


Ala 


val 


Asp 




210 










215 










220 










Pro 


Val 


Ala 


Asp 


lie 


Ala 


Thr 


Leu 


Arg 


He 


Gin 


Thr 


Lys 


Glu 


Pro 


Leu 


225 










230 










235 










240 


Pro 


Thr 


Leu 


Pro 


Leu 


Gly 


Arg 


Ser 


Ala 


Asp 


Val 


Arg 


Gin 


G1 y 


Glu 


Phe 










245 










250 










255 




val 


val 


Ala 


Met 


Gly 


Ser 


Pro 


Fhe 


Ala 


Leu 


Gin 


Asn 


Thr 


He 


Thr 


Ser 








250 










265 










270 






Gly 


Tie 


Val 


Ser 


Scr 


Ala 


Gin 


Arg 


Pro 


Ala 


Arg 


Asp 


Leu 


Gly 


Leu 


Pro 






275 










280 










285 








Gin 


Thr 


Asn 


Val 


Glu 


Tyr 


He 


Gin 


Thr 


Asp 


Ala 


Ala 


He 


Asp 


Phe 


Gly 




290 










295 










300 










Asn 


Ser 


Gly 


Gly 


Pro 


Leu 


Val 


Asn 


Leu 


Asp 


Gly 


Glu 


Val 


He 


Gly 


val 


305 










310 










315 










320 


Asn 


Thr 


Met 


Lys 


val 


Thr 


Ala 


Gly 


He 


Ser 


Phe 


Ala 


He 


Pro 


Ser 


Asp 










325 










330 










335 




Arg 


Leu 


Arg 


Glu 


Phe 


Leu 


His 


Arg 


Gly 


Glu 


Lys 


Lys 


Asn 


ser 


Ser 


Ser 








340 










345 










350 






Gly 


He 


Ser 


Gly 


Ser 


Gin 


Arg 


Arg 


Tyr 


He 


Gly 


Val 


Met 


Met 


Leu 


Thr 






355 










360 










365 








Leu 


Ser 


Pro 


Ser 


He 


Leu 


Ala 


Glu 


Leu 


Gin 


Leu' 


Arg 


Glu 


pro 


Sex 


Phe 




370 










375 










380 










Pro 


Asp 


Val 


Gin 


His 


Gly 


Val 


Leu 


He 


His 


Lys 


Val 


He 


Leu 


Gly 


Ser 


385 










390 










395 










400 


Pro 


Ala 


His 


Arg 


Ala 


Gly 


Leu 


Arg 


Pro 


Gly 


Asp 


Val 


He 


Leu 


Ala 


He 










405 










410 










415 




Gly 


Glu 


Gin 


Met 


Val 


Gin 


Asn 


Al a 


Glu 


Asp 


val 


Tyr 


Glu 


Ala 


Val 


Arg 








420 










425 










430 






Thr 


Gin 


Ser 


Gin 


Leu 


Ala 


val 


Gin 


He 


Arg 


Arg 


Gly Arg 


Glu 


Thr 


Leu 






435 










440 










445 








Thr 


Leu 


Tyr 


Val 


Thr 


Pro 


Glu 


Val 


Thr 


Glu 















450 455 



(2) INFORMATION FOR SEQ ID NO: 9: 

35 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2764 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(d) TOPOLOGY: linear 

40 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

TGGGACA3GC AGCTCCGGGG TCCGCGGTTT CACATCGGAA ACAAAACAGC GGCTGGTCTG 60 

CAAGGAACCT GAGCTACGAG CCGCGGCGGC AGCGGGGCGG CGGGGAAGCG TATACCTAAT 120 

CTGGGAGCCT GCAAGTGACA ACAGCCTTTG CGGTCCTTAG A CAGCTTGG C CTGGAGGAGA 180 

SO ACACATGAAA GAAAGAACCT CAAGAGGCTT TGTTTTCTGT GAAACAGTAT TTCTATACAG 240 

TTGCTCCAAT GACAGAGTTA CCTGCACCGT TGTCCTACTT CCAGAATGCA CAGATGTCTG 300 

AGGACAACCA CCTG AG C AAT A CTGT ACGT A GCCAG A AT G A CAATAGAGAA CGGCAGGAGC 360 

ACAACGACAG ACGGAGCCTT GGCCACCCTG AGCCATTATC TAATGGACGA CCCCAG GGT A 420 

~ ACTCCCGGCA "GGTGGTGGAG"CAAGATGAGG~AAGAAGATGA GGAGCTGACA TTG A AAT ATG " 4 80 - 

GCGCCAAGCA TGTGATCATG CTCTTTGTCC CTGTGACTCT CTGCATGGTG GTGGTCGTGG 540 

55 CTACCATTAA GTCAGTCAGC TTTTATACCC GGAAGGATGG GCAGCTAATC TATACCCCAT 600 

TCACAGAAGA TACCGAGACT GTGGGCCAGA GAGCCCTGCA CTCAATTCTG AATGCTGCCA 660 
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TCATGATCAG TGTCATTGTT GTCATCACTA TCCTCCTGGT GGTTCTGTAT AAATACAGGT 720 

GCTATAAGGT CATCCATGCC TGGCTTATTA TATCATCTCT ATTGTTGCTG TTCTTTTTTT 780 

CATTCATTTA CTTGGGGGAA GTGTTTAAAA CCTATAACGT TGCTGTGGAC TACATTACTG 840 

TTGCACTCCT GATCTGGAAT TTTGGTGTGG TGGGAATGAT TTCCATTCAC TGGAAAGGTC 900 

5 CACTTCGACT CCAGCAGGCA TATCTCATTA TGATTAGTGC CCTCATGGCC CTGGTGTTTA 960 

TCAAGTACCT CCCTGAATGG ACTGCGTGGC TCATCTTGGC TGTGATTTCA GTATATGATT 1020 

TAGTGGCTGT TTTGTGTCCG AAAGGTCCAC TTCGTATGCT GGTTGAAACA GCTCAGGAGA 1080 

GAAATGAAAC GCTTTTTCCA GCTCTCATTT ACTCCTCAAC AATGGTGTGG TTGGT3AATA 1140 

TGGCAGAAGG AGACCCGGAA GCTCAAAGGA GAGTATCCAA AAATTCCAAG TATAATGCAG 1200 

AAAGCACAGA AAGGGAGTCA CAAGA CACTG TTGCAGAGAA TGATGATGGC GGGTTCAGTG 12 6C 

10 AGGAATGGGA AGCCCAGAGG GACAGTCATC TAGGGCCTCA TCGCTCTACA CCTGAGTCAC 13x0 

GAGCTGCTGT CCAGGAACTT TCCAGCAGTA TCCTCGCTGG TGAAGACCCA GAGGAAAGGG 1380 

GAGTAAAACT TGGATTGGGA GATTTCATTT TCTACAGTGT TCTGGTTGG" AAAGCCTCAG 1440 

CAACAGCCAG TGGAGACTGG AACAC AA CCA. TAGCCTGTTT CGTAGCCATA TTAATTGGTT 1500 

TGTGCCTTAC ATTATTACTC CTTGCCATTT TCAAGAAAGC ATTGCCAGCT CTTCCAATCT 1560 

CCATCACCTT TGGGCTTGTT TTCTACTTTG CCACAGATTA T CTTGTAGAG CCTTTTATGG 1620 

15 ACCAATTAGC ATTCCATCAA TTTTATATCT AGCATATTTG CGGTTAGAAT CCCATGGATG -680 

TTTCTTCTTT GACTATAACC AAATCTGGGG AGGACAAAGG TGATTTTCCT GTGTCCACAT 1740 

CTAACAAAGT CAAGATT CCC GGCTGGACTT TTGCAGCTTC CTTCCAAGTC TTCCTGACCA 1800 

CCTTGCACTA TTGGACTTTG GAAGGAGGTG CCTATAGAAA ACGATTTTGA A CAT ACT T C A 1860 

TCGCAGTGGA CTGTGTCCCT CGGTGCAGAA ACTACCAGAT TTGAGGGACG AGGTCAAGGA 1920 

GATATGATAG GCCCGGAAGT TGCTGTGCCC CATCAGCAGC TTGACGCGTG GTCACAGGAC 1980 

20 GATTTCACTG ACACTGCGAA CTCTCAGGAC TACCGGTTAC CAAGAGCTTA GGTGAAGT GG 2 040 

TTTAAACCAA ACGGAACTCT TCATCTTAAA CTACACGTTG AAAATCAACC • CAATAATTCT 2100 

GTATTAACTG AATTCTGAAC TTTTCAGGAG GTACTGTGAG GAAGAGCAGG CACCAGCAGC 2160 

AGAATGGGGA ATGGAGAGGT GGGCAGGGGT TCCAGCTTCC CTTTGATTTT TTGCTGCAGA 2220 

CTCATCCTTT TTAAATGAGA CTTGTTTTCC CCTCTCTTTG AGTCAAGTCA AATATGTAGA 2280 

TTGCCTTTGG CAATTCTTCT TCTCAAGCAC TGACACTCAT TACCGTCTGT GATTGCCATT 2 340 

25 TCTTCCCAAG GCCAGTCTGA ACCTGAGGTT CCTTT AT CCT AAAAGTTTTA ACCTCAGGTT 24 00 

CCAAATTCAG TAAATTTTGG AAACAGTACA GCTATTTCTC ATCAATTCTC TATCATGTTG 2460 

A AGT CAAATT TGGATTTTCC ACCAAATTCT GAATTTGTAG ACATACTTGT ACGCTCACTT 2520 

GCCCCCAGAT GCCTCCTCTG TCCTCATTCT TCTCTCCCAC ACAAGCAGTC TTTTTCTACA 2580 

GCCAGTAAGG CAGCT CTGTC RTGGTAGCAG ATGGTCCCAT TATTCTAGGG TCTTACTCTT 2640 

TGTATGATGA AAAGAATGTG TTATGAATCG GTGCTGTCAG CCCTGCTGTC AGACCTTCTT 2700 

30 CCACAGCAAA TGAGATGTAT GCCCAAAGCG GTAGAATTAA AGAAGAGTAA AATGGCTGTT 2760 
GAAG 



3$ 



(2 J INFORMATION FOR SEQ ID NO: 10: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 467 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

4Q (ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL : NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE: 

45 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 : 

Met Thr Glu Leu Pro Ala Pro Leu Ser Tyr Phe Gin Asn Ala Gin Met 

1 5 • 10 15 

Ser Glu Asp Asn His Leu Ser Asn Thr val Arg Ser Gin Asn Asp Asn 
20 2b 30 

so Arg Glu Arg Gin Glu His Asn Asp Arg Arg Ser Leu Gly His Pro Glu 

3b 40 45 

Pro Leu Scr Asn Gly Arg Fro Gin Gly Asn Ser Arg Gin Val Val Glu 
50 55 60 

65 70 75 80 

55 His Val He Met lcu Phe Val Pro Val Thr Leu Cys Met Val Val val 

85 90 95 



2764 



25 



EP 0 828 003 A2 



10 



20 



Val Ala Thr He Lys Ser val Ser Phe Tyr Thr Arg Lys Asp Gly Gin 

100 105 110 

Leu He Tyr Thr Pro Phe Thr Glu Asp Thr Giu Thr Val Gly Gin Arg 

115 120 125 

Ala Leu His ser He Leu Asn Ala Ala lie Met He Ser Val He Val 

130 135 140 

Val Met Thr Tie Leu Leu Val Val Leu Tyr Lys Tyr Arg Cys Tyr Lys 
145 150 155 160 

Val He His Ala Trp Leu lie He Ser scr Leu Leu Leu Leu Phe Phe 

165 170 17f> 

Phe Ser Phe He Tyr Leu Gly Glu val Phe Lys Thr Tyr Asn val Ala 

180 185 190 

Val Asp Tyr He Thr Val Ala Leu Leu He Trp Asn Phe Gly val Val 

195 200 205 

Gly Met He ser He His Trp Lys Gly Pro Leu Arg Leu Gin Gin Ala 

210 215 220 

Tyr Leu He Met He Ser Ala Leu Met Ala Leu Val Phe He Lys Tyr 
225 230 235 240 

Leu Pro Glu Trp Thr Ala Trp Leu He Leu Ala Val He Ser val Tyr 

245 250 255 

Asp Leu Val Ala Val Leu Cys Pro Lys Gly Pro Leu Arg Met Leu Val 

260 265 270 

Glu Thr Ala Gin Glu Arg Asn Glu Thr Leu Phe Pro Ala Leu He Tyr 

275 280 285 

Scr Ser Thr Met val Trp Leu Val Asn Met Ala Glu Gly Asp Pro Glu 

290 295 300 

Ala Gin Arg Arg Val Ser Lys Asn Ser Lys Tyr Asn Ala Glu Ser Thr 
305 310 315 320 

Glu Arg Glu Ser Gin Asp Thr val Ala Glu Asn Asp Asp Gly Gly Phe 

325 330 335 

Ser Glu Glu Trp Glu Ala Gin Arg Asp Ser His Leu Gly Pro His Arg 

340 345 350 

Ser Thr Pro Glu Ser Arg Ala Ala Val Gin Glu Leu Ser ser Ser He 

355 360 365 

Leu Ala Gly Glu Asp Pro Glu Glu Arg Gly Val Lys Leu Gly Leu Gly 

370 375 380 

Asp Phe He Phe Tyr Ser Val Leu Val Gly Lys Ala Ser Ala Thr Ala 
335 A90 395 400 

Ser Gly Asp Trp Asn Thr Thr He Ala Cys Phe Val Ala He Leu He 

405 410 415 

Gly Leu Cys Leu Thr Leu Leu Leu Leu Ala He Phe Lys Lys Ala Leu 

420 425 430 

Pro Ala Leu Pro He Ser He Thr Phe Gly Leu Val Phe Tyr Phe Ala 

435 440 445 

Thr Asp Tyr Leu Val Gin Pro Phe Met Asp Gin Leu Ala Phe His Gin 

450 455 460 

Phe Tyr He 
465 

12) INFORMATION FOR SEQ ID NO: 11: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(uj topology: linear 

so (ii) MOLECULE TYPE: cDKTA 

(iii) HYPOTHETICAL : NO 

(iv) ANTISENSE: NO 
<v) FRAGMENT TYPE: 

(^4-)— OR-I-SI-NA^-SOURCE-: 

6S Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



30 



40 
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CGGAATTCCG TATGCTGGTT GAAACA 2 6 

(2) INFORMATION FOR SEQ ID NO: 12: 

5 (i) SEQUENCE CHARACTERISTICS : 

(AJ LENGTH: 29 base pairs 
(D) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE! CDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 
(vj FRAGMENT TYP3 : 
(vi) ORIGINAL SOURCE: 

75 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

CGGGATCCTC AGGCTACGAA ACAGCCTAT 29 

(2) INFORMATION FOR SEQ ID NO: 13: 

20 <i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1854 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
{ D) TOPOLOGY: linear 

25 <ii) MOLECULE TYPE: CDNA 

<iii) HYPOTHETICAL: NO 
(iv) ANTISENSE: NO 
(V) fRAGMEWT TYPE: 
(vi) ORIGINAL SOURCE: 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

TATATCAGCG GTATGACCGA CCTCTATGCG TGGGATGAAT ACCGACGTCT GATGGCCGTA 60 

GAACAATAAC CAGGCTTTTG TAAAGACGAA CAATAAATTT TTACCTTTTG CAGAAACTT7 .120 

A GTTCGG A A C TTCAGGCTAT AAA A CG AAT C TGAAGAACAC AGCAATTTTG CGTTATCTGT 180 

TAATCGAGAC TGAAATACA7 GAAAAAAACC ACATTAGCAC TGAGTCGACT GGCTCTGAGT 240 

35 TTAGGTTTGG CGTTATCTCC GCTCTCTGCA ACGGCGGCTG AGACTTCTTC AGCAACGACA 300 

GCCCAGCAGA TGCCAAGCCT TGCACCGATG CTCGAAAAGG TGATGCCTTC AGTGGTCAGC 360 

ATTAACGTAG AAGGTAGCAC AACCGTTAAT ACGCCGCGTA TGCCGCGTAA TTTCCAGCAG 42 0 

TTCTTCGGTG AT G ATT CT CC GTTCTGCCAG GAAGGTTCTC CGTTCCAGAG CTCTCCGTTC 480 

TGCCAGGGTG GCCAGGGCGG TAATGGTGGC GGCCACCAAC AGAA ATT CAT GGCGCTGGGT 540 

TCCGGCGTCA TCATTGATGC CGATAAAGGC TATGT CGTC A CCAACAACCA CGTTGTTGAT 600 

40 AACGCGACGG TCATTAAAGT TCAACTGAGC GATGGCCGTA AGTTCGACGC GAAGATGGTT 660 

GGCAAAGATC CGCGCTCTGA TATCGCGCTG ATCCAAATCC AGAACCCGAA AAACCTGACC 720 

GCAATTAAGA TGGCGGATTC TGATGCACTG CGCGTGGGTG ATTACA CCGT AGGGATTGGT 780 

AACCCGTTTG GTCTGGGCGA GACGGTAACT TCCGGGATTG TCTCTGCGCT GGGGCGTAGC 840 

GGCCTGAATG CCGA A AACT A CGAAAACTTC ATCCAGACCG ATGCAGCGAT CAA CCGTGGT 900 

AACTCCGGTG GTGCGCTGGT TAACCTGAAC GGCGAACTGA TCGGTATCAA CACCGCGATC 960 

45 CTCGCACCGG ACGGCGGCAA CAT CGGT AT C GGTTTTGCTA TCCCGAGTAA CAT GGTGAAA 1020 

AACCTGACCT CGCAGATGGT GGAATACGGC CAGGTGAAAC GCGGTGAGCT GGGTATTATG 1080 

GGGACTGAGC TGAACTCCGA ACTGGCGAAA GCGATGAAAG TTGACGCCCA GCGCGGTGCT 1140 

TTCGTAAGCC AGGTTCTGCC TAATTCCTCC GCTGCAAAAG CGGGCATTAA AGCGGGTGAT 1200 

GTGATCACC? CACTGAACGG TAAGCCGATC AGCAGCTTTG CCGCACTGCG TGCTCAGGTG 1260 

GGTACTATGC CGGTAGGCAG CAA A CTGACC CTGGGCTTAC TGCGCGACGG TAAGCAGGTT 1320 

50 AACGTGAACC TGGAACTGCA GCAGAGCAGC CAGAAT C A GG TTGATTCCAG CTCCATCTTC 1380 

AACGGCATTG AAGGCGCTGA GATGAGCAAC AAAGGCAAAG ATCAGGGCGT GGTAGTGAAC 144 0 

AACGTGAAAA CGGGCACTCC GGCTGCGCAG ATCGGCCTGA AGAAAGGTGA TGTGATT A TT 1500 

GGCGCGA A CC AGCAGGCAGT GAAAAACATC GCTGAACTGC GTAAAGTTCT CGACAGCAAA 1560 



CCGTCTGTGC TGGCACTCAA CATTCAGCGC CGCGACCGCC ATCTACCTGT TAATGCAGTS VG2XT 
ATCTCCCTCA ACCCCTTCCT GAAAA CGGG A AGGGGTTCTC CTTACAATCT GTGAACTTCA 168 0 
55 CCACAACTCC ATACATCTTC ATCATCCTTT AGGCATTTGC ACAATGCCGT ACGTT A CGT A 174 0 
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CTTCCTTATG CTAAGCCGTG CATAACGGAG GACTTATGGC TGGCTGCCAT CTTGATACCA 1800 
AAATGGCGCA GGATATCGTG GCACGTACCA TGCGCATCAT CGATACCAAT ATCA 1854 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH : 491 amino acids 
(E) TYPE : amino acid 

(C) STRANDEDNESS : single 
(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: pap tide 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

{v) FRAGMENT TYPE: N- terminal 
(vi) ORIGINAL SOURCE: 

{Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Lya Lys Thr Thr Leu Ala Leu Ser Arg Leu Ala Leu Ser Leu Gly 

lb 10 15 

Leu Ala Leu Ser pro Leu Ser Ala Thr Ala Ala Glu Thr Ser Ser Ala 

20 25 30 

Thr Thr Ala Gin Gin Met-. Pro Ser Leu A:. a Pro Met Leu Glu Lys Val 

35 40 45 

Met Pro Ser Val Val Ser He Asn val Glu Gly Ser Thr Thr val Asn 

50 55 60 

Thr Pro Arg Met Pro Arg Asn ?he Gin Gin Phe Phe Gly Asp Asp Ser 
65 70 .75 80 

Pro Phe Cys Gin Glu Gly Ser Pro Phe Gin Ser Ser Pro Phe Cys Gin 

05 90 95 

Gly Gly Gin Gly Gly Asn Gly Gly Gly Gin Gin Gin Lys Phe Met Ala 

100 105 110 

Leu Gly Ser Gly Val He lie Asp Ala Asp Lys Gly Tyr Val Val Thr 

115 120 125 

Asn Asn His Val Val Asp Asn Ala Thr Val lie Lys Val Gin Leu Ser 

130 135 140 

Asp Gly Arg Lys Phe Asp Ala Lys Met Val Gly Lys Asp Pro Arg ser 
145 150 155 160 

Asp lie Ala Leu He Gin He Gin Asn Pro Lys Asn Leu Thr Ala He 

165 170 175 

Lys Met Ala Asp Ser Asp Ala Leu Arg Val Gly Asp Tyr Thr Val Gly 

180 185 190 

He Gly Asn Pro Phe Gly Leu Gly Glu Thr Val Thr Ser Gly He Val 

195 200 205 

Ser Ala Leu Gly Arg Ser Gly Leu Asn Ala Glu Asn Tyr Glu Asn Phe 

210 215 220 

He Gin Thr Asp Ala Ala He Asn Arg Gly Asn Ser Gly Gly Ala Leu 
225 230 235 240 

val Asn Leu Asn Gly Glu Leu He Gly He Asn Thr Ala He Leu Ala 

245 250 255 

Pro Asp Gly Gly Asn He Gly He Gly Phe Ala He Pro Ser Asn Met 

260 265 270 

Val Lys Asn Leu Thr Ser Gin Met Val Glu Tyr Gly Gin Val Lys Arg 

275 280 285 

Gly G.u Leu Gly He Met Gly Thr Glu Leu Asn Ser Glu Leu Ala Lys 

290 295 300 

Ala Met Lys val Asp Ala Gin Arg Gly Ala Phe Val Ser Gin Val Leu 
305 310 315 320 

Pro Asn Ser Ser Ala Ala Lys Ala Gly He Lys Ala Gly Asp val He 

325 330 335 

Thr ser Leu Asr. GTy~~ Lys Pro He Ser Ser Phe Ala Ala Leu Arg Ala 

340 345 350 

Gin Val Gly Thr Met Pro Val Gly Ser Lys Leu Thr Leu Gly Leu Leu 
355 360 365 
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30 
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Val 
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Ser 


Ser 




370 










375 
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Ser 
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He 


Glu 


Gly 


Ala 


385 










390 










395 
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Glu 
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Asp 
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Gly 


Val 


Vcl 


Val 


Asn 


Asn 


Val 
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410 
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Lys 
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Gly 


Thr 
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Gin 
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Gly 
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Lys 


Gly 


Asp 


Val 








420 
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lie 
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He 


Ala 
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440 
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Lys 


Val 


Leu 


Asp 


Ser 


Lys 


Pro 


Ser 


Val 


Leu 


Ala 


Leu 


Asn 


He 


Gin 


Arg 




450 










455 










460 
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Asp 


Arg 


His 


Leu 


Pro 


Val 


Asn 


Ala 


Val 


He 


Ser 


Leu 


Asn 


Pro 


Phe 


465 










470 










475 










480 


Leu 


Lys 


Thr 


Gly 


Arg 


Gly 


Ser 


Pre 


Tyr 


Asn 


Leu 













485 490 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNES S : S i ng 1 e 

(c) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

txi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CTGGATGGGG AGGTGATTGG AGTG 24 
(2) INFORMATION FOR SEQ ID NO: 16: 



(i> SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 base pairs 
3S [B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NO 
40 (iv) ANTI S ENS E : NO 

(vj FRAGMENT TYPE: 
(vi) ORIGINAL SOURCE: 

(xi- SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

45 GTCTCTGGGC CCCGGTTGTC TGTTG 2 5 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2036 base pairs 
so (b) 7YPE: nuclei c acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

MOLECULE TYPE: CDNA " " 

(iii) HYPOTHETICAL : NO 
ss (iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 
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(vi) ORIGINAL SOURCE: 
Feature polymorphism at 1325 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

CCGGCCCTCG CCCTGTCCGC CGCCACCGCC GCCGCCGCCA GAGTCGCCAT GCAGATCCCG 60 

CGCGCCGCTC TTCTCCCGCT GCTGCTGCTG CTGCTGGCGG CGCCCGCCTC GGCGCAGCTG 120 

TCCCGGGCCG GCCGCTCGGC GCCTTTGSCC GCCGGGTGCC CAG A CCGCTG CGAGCCGGCG 180 

CGCTGCCCGC CGCAGCCGGA GCACTGCGAG GGCGGCCGGG CCCGGGACGC GTGCGGCTGC 240 

TGCGAG3TGT GCGGCGCGCC CGAGGGCGCC GCGTGCGGCC TGCAGGAGGG CCCGTGCGGC 300 

GAGGGGCTGC AGTGCGTGGT GCCCTTCGGG GTGCCAGCCT CGGCCACGGT GCGGCGGCGC 360 

GCGCAGGCCG GCCTCTGTGT GTGCGCCAGC AGCGAGCCGG TGTGCGGCAG CGACGCCAAC 420 

ACCTACGCCA ACCTGTGCCA GCTGCGCGCC GCCAGCCGCC GCTCCGAGAG GCTGCACCGG 480 

CCGCCGGTCA TCGTCCTGCA GCGCGGAGCC TGCGGCCAAG GGCAGGAAGA TCCCAACAGT 540 

TTGCGCCATA AATATAACTT TATCGCGGAC GTGGTGGAGA AGATCGCCCC TGCCGTGGTT 600 

CATATCGAAT TGTTTCGCAA GCTTCCGTTT TCTAAACGAG AGGTGCCGGT GGCTAGTGGG 660 

TCTGGGTTTA TTGTGTCGGA AGATGGACTG AT CGT GACAA ATGCCCACGT GGTGACCAAC 720 

AAGCACCGGG T CA A AGTTGA GCTGAAGAAC GGTGCCACTT ACGAAGCCAA AATCAAGGAT 780 

GTGGATGAGA AAGCAGA CAT CGCACTCATC AAAATTGACC ACCAGGGCAA GCTGCCTGTC 840 

CTGCTGCTTG GCCGCTCCTC AGAGCTGCGG CCGGGAGAGT TCGTGGTCGC CATCGGAAGC 900 

CCGTTTTCCC TTCAAAACAC AGTCACCACC GGGATCGTGA GCACCACCCA GCGAGGCGGC 960 

AAAGAGCTGG GGCTCCGCAA CTCAGACATG GACTACATCC AGACCGACGC CATCATCAAC 1020 

TATGGAAACT CGGGAGGCCC GTTAGTAAAC CTGGACGGTG AAGTGATTGG AATTAACACT 1080 

TTGAAAGTGA CAGCTGGAAT CTCCTTTGCA ATCCCATCTG ATAAGATTAA AAAGTTCCT C 1140 

ACGGAGTCCC ATGACCGACA GGCCAAAGGA AAAGCCATCA CCAAGAAGAA GTATATTGGT 1200 

ATCCGAATGA TGTCACTCAC GTCCAGCAAA GCCAAAGAGC TGAAGGACCG GCACCGGGAC 1260 

TTCCCAGACG TGATCTCAGC AGCGTATATA ATTGAAGTAA TTCCTGATAC CCCAGCAGAA 1320 

GCTGKTGGTC TCAAGGAAAA CGACGTCATA AT CAG CAT C A ATGGACAGTC CGTGGTCTCC 13 80 

GCCAATGATG TCAGCG A CGT CATTAAAAGG GAAAGCACCC TGAACATGGT GGTCCGCAGG 1440 

GGTAATGAAG AT AT CAT GAT CACAGTGATT CCCGAAGAAA TTGACCCATA GGCAGAGGCA 15 00 

TGAGCTGGAC TTCATGTTTC CCTCAAAGAC TCTCCCGTGG ATGACGGATG AGGACTCTGG 1560 

GCTGCTGGAA TAGGACACTC AAGACTTTTG ACTGCCATTT TGTTTGTTCA GTGG AG ACT C 1620 

CCTGGCCAAC AG AATC CT T C TTGATAGTTT GCAGGCAAAA CAAATGTAAT GTTGCAGATC 1680 

CGCAGGCAGA AGCTCTGCCC TTCTGTATCC TATGTATGCA GTGTGCTTTT TCTTGCCAGC 1740 

TTGGGCCATT CTTGCTTAGA CAGTCAGCAT TTGTCTCCTC CTTTAACTGA GTCATCATCT 18 00 

TAGTCCAACT AATGCAGTCG ATACAATGCG TAGATAGAAG AAGCCCCACG GGAGCCAGGA 1860 

TGGGACTGGT CGTGTTTGTG CTTTTCTCCA AGTCAGCACC CAAAGGTCAA TGCACAGAGA 1920 

CCCCGGGTGG GTGAGCGCTG GCTTCTCAAA CGGCCGAAGT TGCCTCTTTT AGGAATCTCT 19 80 

TTGGAATTGG GAGCACGATG A CTCTGAGTT TGAGCTATTA AAGTACTTCT TACAAA 2 036 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 480 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE: 
Feature - 213 Gly/val polymorph 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Met Gin lie Pro Arg Ala Ala Leu Leu Pro Leu Leu Leu Leu Leu Leu 

1 5 10 15 

Ala Ala Pro Ala Ser Ala Gin Leu Ser Arg Ala Gly Arg Ser Ala Pro 

20 25 30 

Leu Ala Ala Gly Cys Pro Asp Arg Cys Glu Pro Ala Arg Cys Pro Pr:) 
35 40 45 

-^-n^ro^l-ii^M^y^^i^-G-]^Gl^^-ri3--A^i— A^g— Asp— Ala— Cys-Gi-y— eys 

50 55 60 

Cys Glu Val Cy3 Gly Ala Pro Glu Gly Ala Ala Cys Gly Leu Gin Glu 
65 70 75 80 
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Gly Pro Cys Gly Glu Gly Leu Gin Cys Val Val Pro Phe Gly Val Pro 

85 90 95 

Ala Ser Ala Thr Val Arg Arg Arg Ala Gin Ala Gly Leu Cys Val Cys 

100 105 HO 

Ala Ser Ser Glu Pro Val Cys Gly Ser Asp Ala Asn Thr Tyr Ala Asn 

115 120 125 

Leu Cys Gin T.eu Arg Ala Ala Ser Arg Arg Ser Glu Arg Leu Hie Arg 

130 135 140 

Pro Pro val lie Val Leu Gin Arg Gly Ala Cys Gly Gin Gly Gin Glu 
^45 150 155 160 

Asp Pro Asn Ser Leu Arg His Lys Tyr Asn Phe He Ala Asp Val Val 

165 170 175 

Glu Lys lie Ala Pro Ala Val Val His Tie Glu Leu phe Arg Lys Leu 

180 185 190 

Pro Phe Ser Lys Arg Glu Val Pro Val Ala Ser Gly Ser Gly Phe lie 

195 200 205 

Val Ser Glu Asp Xaa Leu lie Val Thr Asn Ala His Val val Thr Asn 

210 215 220 

Lys His Arg val Lys Val Glu Leu Lys Asn Gly Ala Thr Tyr Glu Ala 
225 230 235 240 

Lys rie Lys Asp Val Asp Glu Lys Ala Asp lie Ala Leu He Lys He 

245 250 2b5 

Asp His Gin Gly Lys Leu Pro Val Leu Leu Leu Gly Arg Ser Ser Glu 

260 265 270 

Leu Arg Pro Gly Glu Phe Val Val Ala He Gly Ser Pro Phe Ser Leu 

275 280 285 

Gin Asn Thr Val Thr Thr Gly He val Ser Thr Thr Gin Arg Gly Gly 

290 295 300 

Lys Glu Leu Gly Leu Arg Asn Ser Asp Met Asp Tyr He Gin Thr Asp 
305 310 315 320 

Ala He He Asn Tyr Gly Asn Ser Gly Gly Pro Leu Val Asn Leu Asp 

325 330 335 

Gly Glu Val He Gly He Asn Thr Leu Lys Val Thr Ala Gly He Ser 

340 345 350 

Phe Ala Tie Pro Ser Asp Lys He Lys Lys Phe Leu Thr Glu Ser His 

355 360 365 

Asp Arg Gin Ala Lys Gly Lys Ala He Thr Lys Lys Lys Tyr lie Gly 

370 375 380 

He Arg Met Met Ser Leu Thr Ser Ser Lys Ala Lyn Glu Leu Lys Asp 
385 390 395 400 

Arg His Arg Asp Phe Pro Asp Val He Ser Gly Ala Tyr lie He Glu 

405 410 415 

Val He Pro Asp Thr Pro Ala Glu Ala Gly Gly Leu Lys Glu Asn Asp 

420 425 430 

Val He He Ser lie Asn Gly Gin Ser Val val Ser Ala Asn Asp Val 

435 440 445 

Ser Asp Val He Lys Arg Glu Ser Thr Leu Aan Met val val Arg Arg 

450 455 460 

Gly Asn Glu Asp He Met He Thr val He Pro Glu Glu He Asp Pro 
465 470 4/5 480 

(2) INFORMATION FOR SEQ IE NO : 19 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
"<Tr±7— H YPOTH ETXCAXrnjO ~ 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
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(xi) SEQUENCE DESCRIPTION: SEQ ID WO: 19: 

ATGCTGAACA TCGGGAAAGC TTGGTTCTCG 30 

(2} INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 
(A; LENGTH: 26 base pairs 
(B) TYPE: nucleic acid 
(Cj STRANDEDNESS : single 
(U) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 
<iii> HYPOTHETICAL: NO 
<iv> ANT I SENSE : NO 
<v) FRAGMENT TYPE: 
T5 <vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

CCAACAGACA ACCGGGCCCA GAGACT 26 

20 (2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY ! linear 

(ii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NC 

(iv) ANT I SENSE : NO 
(V) FRAGMENT TYPE: 

0 (vi) ORIGINAL SOURCE: 

(xx ) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
TGCCTCCTCG CCCGCCCTAC TCAGA 25 
35 (2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 5 87 base pairs 
(&} TYPE: nucleic acid 

4Q (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPEs CDMA 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 
4S (v) FRAGMENT TYPE : 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

CGTGGATCCC GAGAAAGAGG CGCAGGACGA GGAGGCAGAA CCCGACTGGC GCGTAGAGCA 60 

so GCAGC A CG AG CAGTAGGAAG CAGTCACCCG GAAGCCTGGG GGCGAGAGGC GAAGTGGTCA 12 0 

GGCGCCGAAG GOCGAGAGCA CGCGGGGATC GGTCTCTTCC CGCCGGGTCT CTTACCGGTG 180 

CGAGT CAAAG AGCCGCTCCG GCCCCGGCCC TGAGGGAAGC TCCATAACTG CTG CTTCAGG 24 0 

AGCGCCCGGC CGTCGCCGCC GCCGCCATTT TCGCGCCCGG CCGCAGGGGC TCTTGGGAAG 3 00 

GCCGAGT-CT-T — TCGGCAT.CCC— HCf!GGClCTGA_GGGGA CCCG A A HT P PT ft A ftfi CGCGCCGGAA 3 60 

GGGCTAGCGG TCCCAGCATA CCCCGCGGCC CCTTGGGCCG TCTCACAACT CGCGTCCGGC 42 0 

55 GGAGACCACA ATTCCCGGCA TTCGTGGGGC AGGG A GG AG T CGGCCTCCCG GAATCCTGGT 480 

CCCGGCGTGC ACTTCTGAAG GACTTCAGGT ACCGGCGTGC CCCGCGTCCT ACTGTCCGCC 540 
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TGCTCGCGTC CTGGGTGCCG CCTCTGAGTA GGGCGGGCGA GGAGGCA 58 7 

(2) INFORMATION FOR SfcQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2187 base pairs 

(B) TYPE: nucleic acid 
■C) STRANDEDNESS : single 
[ D ) TOPOLOGY : 1 inear 

<ii) MOLECULE TYPE: CDNA 
(iii) HYPOTHETICAL : NO 
<iv) ANTI SENSE: NO 
(v; FRAGMENT TYPE: 
(vi) ORIGINAL SOURCE: 
(ix) fEATUSE: 

(A) NAME/ KEY : Coding Sequence 
<B) LOCATION: 603... 1976 
(D) OTHER INFORMATION: 



20 



(xi> SEQUENCE DESCRIPTION: SEC ID NO:23: 
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30 



CGTGG AT CCC GAGAAAGAGG CGCAGGACGA GGAGGCAGAA CCCGACTGGC GCGTAGAGCA 60 

GCAGCACGAG CAGTAGGAAG CAGTCACCCG GAACCCTGGG GGCGAGAGGC CAAGTGGTCA 120 

GGCGCCC-AAG GCCGAGAGCA CGCGGGGATC GGT CTCTTCC CGCCGGGTCT CTTACCGGTG 180 

CGAGTCAAAG AGCCGCTCCG GCCCCGGCCC TGAGGGAAGC TCCATAACTG CTGCTTCAGG 240 

AGCGCCCGGC CGTCGCCGCC GCCGCCATTT TCGCGCCCGG CCGCAGGGGC TCTTGGGAAG 300 

GCGGAGTCTT TGGGCATCCG CCCGGGGTGA GGGGACCCGA AGTCCTGAGG CGCGCCGGAA 360 

GGGCTAGCGG T CCCAGCAT A CCCCGCGGCC CCTTGGGCCG TCTC A CAACT CGCGTCCGGC 4 20 

GG AG A CCA CA ATTCCCGGCA TTCGTGGGGC AGGGAGGAGT CGGCCTCCCG GAATCCTGGT 480 

CCCGGCGTGC ACTTCTGAAG GACTTCAGGT ACCGGCGTGC CCCGCGTCCT ACTGTCCGCC 540 

TGCTCGCGTC CTGGGTGCCG CCTCTGAGTA GGGCGGGCGA GGAGGCAGCC AAG GCGGAGC 600 

TO ATG GCT GCG CCG AGG GCG GGG CGG GGT GCA GGC TGG AGC CTT CGG 647 
Met Ala Ala Pro Arg Ala Gly Arg Gly Ala Gly Trp Ser Leu Arg 
1 5 10 15 



35 



GCA TGG CGG GCT TTG GGG GGC ATT TGC TGG GGG AGG AGA CCC CGT TTG 
Ala Trp Arg Ala Leu Gly Gly lie Cys Trp Gly Arg Arg Pro Arg Leu 
20 25 30 



695 



40 



ACC CCT GAC CTC CGG GCC CTG CTG ACG TCA GGA ACT TCT GAC CCC CGG 743 
Thr Pro Asp Leu Arg Ala Leu Leu Thr Ser Gly Thr Ser Asp Pro Arg 
35 40 45 

GCC CGA GTG ACT TAT GGG ACC CCC AGT CTC TGG GCC CGG TTG TCT GTT 791 
Ala Arg val Thr Tyr Gly Thr Pro Ser Leu Trp Ala Arg Leu Ser Val 
50 55 60 



4S 



GGG GTC ACT GAA CCC CGA GCA TGC CTG ACG TCT GGG ACC CCG GGT CCC 
Gly Val Thr Glu Pro Arg Ala Cys Leu Thr Ser Gly Thr pro Gly Pro 
65 70 75 



839 



CGG GCA CAA CTG ACT GCG GTG ACC CCA GAT ACC AGG ACC CGG GAG GCC 
Arg Ala Gin Leu Thr Ala Val Thr Pro Asp Thr Arg Thr Arg Glu Ala 
80 85 90 95 



887 



55 



TCA GAG AAC TCT GGA ACC CGT TCG CGC GCG TGG CTG GCG GTG GCG CTG 
Ser Glu Asn Ser Gly Thr Arg Ser Arg Ala Trp Leu Ala val Ala Leu 
100 105 110 



GGC GCT GGG GGG GCA GTG CTG TTG TTG TTG TGG GGC GGG GGT CGG GGT 
Gly Ala Gly Gly Ala val Leu Leu Leu Leu Trp Gly Gly Gly Arg Gly 
115 120 125 



935 



983 
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20 



CCT CCG GCC GTC CTC GCC GCC GTC CCT AGC CCG CCG CCC GCT 7CT CCC 1031 
Pro Pro Ala Val Leu Ala Ala val Pro Ser Pro Pro Pro Ala Ser Pro 
130 135 140 

CGG AGT CAG TAC AAC TTC ATC GCA GAT GTG GTG GAG AAG ACA GCA CCT 1079 
Arg Ser Gin Tyr Asn Phc He Ala Asp val val Glj Lys Thr Ala Pro 
145 150 155 

GCC GTG GTC TAT ATC GAG ATC CTG GAC CGG CAC CCT TTC TTG GGC CGC 1127 
Ala val val Tyr He Glu He Leu asp Arg His Pro Phe Leu Gly Arg 
160 165 170 175 

GAG GTC CCT ATC TCG AAC GGC TCA GGA TTC GTG GTG GCT GCC GAT GGG 1175 
Glu Val Pro He Ser Asn Gly Ser Gly Phe Val Val Ala Ala Asp Gly 
180 185 190 

CTC ATT GTC ACC AAC GCC CAT GTG GTG GCT GAT CGG CGC AGA GTC CGT 1223 
Leu He Val Thr Asn Ala His Val Val Ala Asp Arg Arg Arg Val Arg 
195 200 205 

GTG AGA CTG CTA AGC GGC GAC ACG TAT GAG GCC GTG GTC ACA GCT GTG 1271 
Val Arg Leu Leu Ser Gly Asp Thr Tyr Glu Ala Val val Thr Ala val 
210 215 220 

GAT CCC GTG GCA GAC ATC GCA ACG CTG AGG ATT CAC ACT AAG GAG CCT 13 ID 

Asp Pro Val Ala Asp He Ala Thr Leu Arg He Gin Thr Lys Glu Pro 
225 230 235 

CTC CCC ACG CTG CCT CTG GGA CGC TCA GCT GAT GTC CGG CAA GGG GAG 13 67 
Leu Pro Thr Leu Pro Leu Gly Arg Ser Ala Asp Val Arg Gin Gly Glu 
240 245 250 255 

TTT GTT GTT GCC ATG GGA AGT CCC TTT GCA CTG CAG AAC ACG ATC ACA 1415 
Phe Val Val Ala Met Gly Ser Pro Phe Ala Leu Gin Asn Thr He Thr 
260 265 270 

TCC GGC ATT GTT AGC TCT GCT CAG CGT CCA GCC AGA GAC CTG GGA CTC 1463 

Ser Gly He Val Ser Ser Ala Gin Arg Pro Ala Arg Asp Leu Gly Leu 
275 280 285 

35 

CCC CAA ACC AAT GTG GAA TAC ATT CAA ACT GAT GCA GCT ATT GAT TTT 1511 

Pro Gin Thr Asn Val Glu Tyr He Gin Thr Asp Ala Ala He Asp Phe 
290 295 300 

GGA AAC TCT GGA GGT CCC CTG GTT AAC CTG GAT GGG GAG GTG ATT GGA 1559 
40 Gly Asn Ser Gly Gly Pro Leu Val Asn Leu Asp Gly Glu Val He Gly 

305 310 315 

GTG AAC ACC ATG AAG GTC ACA GCT GGA ATC TCC TTT GCC ATC CCT TCT 1607 

val Asn Thr Met Lys Val Thr Ala Gly He Ser Phe Ala He Pro Ser 

320 325 330 335 

45 

GAT CGT CTT CGA GAG TTT CTG CAT CGT GGG GAA AAG AAG AAT TCC TCC 1655 

Asp Arg Leu Arg Glu Phe Leu His Arg Gly Glu Lys Lys Asn Ser Ser 
340 345 350 

TCC GGA ATC AGT GGG TCC CAG CGG CGC TAC ATT GGG GTG ATG ATG CTG 1703 
50 Ser G1 y T1 e Ser Gly Ser Gin Arg Arg Tyr He Gly Val Met Met Leu 

355 360 365 

ACC CTG AGT CCC AGC ATC CTT GCT GAA CTA CAG CTT CGA GAA CCA AGC 1751 

Thr— Leu -3er-Pro -Ser He~-Leu- Ala"Glu" -L-eu~G-In— L-eu-ATg~GTu-pro~Ser- 

370 375 380 



30 



55 



TTT CCC GAT GTT CAG CAT GGT GTA CTC ATC CAT AAA GTC ATC CTG GGC 1799 
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Phe pro Asp val Gin His Gly val Leu He His Lys val He Leu Gly 
385 390 395 

TCC CCT GCA CAC CGG GCT GGT CTG CGC CCT GGT GAT GTG ATT TTG GCC 
Ser Pro Ala His Arg Ala Gly Leu Arg Pro Gly Asp val He Leu Ala 
400 405 410 415 

ATT GGG GAG CAG ATG GTA CAA AAT GCT GAA GAT GTT TAT GAA GCT GTT 
He Gly Glu Gin Met Val Gin Asn Ala Glu Asp Val Tyr Glu Ala Val 
420 425 430 

CGA ACC CAA TCC CAG TTG GCA GTG CAG ATC CGG CGG GGA CGA GAA ACA 
Arg Thr Gin Ser Gin Leu Ala Val Gin He Arg Arg Gly Arg G3u Thr 
435 1 440 44b 

CTG ACC TTA TAT GTG ACC CCT GAG GTC ACA GAA TG AAT AG ATC ACCAACAGTA 
I-eu Thr Leu Tyr Val Thr Pro Glu Val Thr Glu 
450 455 

TGAGGCTCCT GCTCTGATTT CCTCCTTGCC TTTCTGGCTG AGGTTCTGAG GGCACCGAGA 
CAGAGGGTTA AATGAACCAG TGGGGGCAGG TCCCTCCAAC CACCAGCACT GACTCCTGGG 
CTCTGAAGAA TCACAGAAAC A CTTTTT AT A TAAAATAAAA T TATA CCT AG CAACATAAAA 
AAAAAAAAAA A 

(2) INFORMATION FOR SEQ ID NO: 24: 

<i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2187 base pairs 
{B) TYPE: nucleic acid 
(C) STRANDEDNESS: single 
CD) TOPOLOGY: linear 

Cii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TVPE: 

(vi) ORIGINAL SOURCE i 
(ix) FEATURE: 

(A) NAME/KEY: Coding Sequence 

(B) LOCATION: 603... 1976 
(D) OTHER INFORMATION: 



(xi; SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

CGTGGATCCC GAGAAAGAGG CGCAGGA CGA GGAGGCAGAA CCCGACTGGC GCGTAGAGCA 
GCAGCACGAG CAGTAGGAAG CAGTCACCCG GAAGCCTGGG GGCGAGAGGC GAAGTGGTCA 
GGCGCCGAAG GCCGAGAGCA CGCGGGGATC GGTCTCTTCC CGCCGGGTCT CTTACCGGTG 
CGAGTCAAAG AGCCGCTCCG GCCCCGGCCC TGAGGGAAGC TCCATAACTG CTGCTTCAGG 
AGCGCCCGGC CGTCGCCGCC GCCGCCATTT TCGCGCCCGG CCG CAGGGGC TCTTGGGAAG 
GCGGAGTCTT TGGGCATCCG CCCGGGGTGA GGGGACCCGA AGTCCTGAGG CGCGCCGGAA 
GGGCTAGCGG TCCCAGCATA CCCCGCGGCC CCTTGGGCCG TCT CACAACT CGCGTCCGGC 
GGAGACCACA ATT CCCGGCA TTCGTGGGGC AGGGAGGAGT CGGCCTCCCG GAATCCTGGT 
CCCGGCGTGC ACTT CTGAAG GACTTCAGGT ACCGGCGTGC CCCGCGTCCT ACTGTCCGCC 
TGCTCGCGTC CTGGGTGCCG CCTCTGAGTA GGGCGGGCGA GGAGGCAGCC AAGGCGGAGC 
TG ATG GCT GCG CCG AGG GCG GGG CGG GGT GCA GGC TGG AGC CTT CGG 
Met Ala Ala Pro Arg Ala Gly Arg Gly Ala Gly Trp Ser Leu Arg 
15 10 15 

GCA TGG CGG GCT TTG GGG GGC ATT CGC TGG GGG AGG AGA CCC GGT TTG 
Ala Trp Arg Ala - Leu -Gly Gly- He Arg Trp Gly Arg Arg Pro Arg Leu - 
20 25 30 

ACC CCT GAC CTC CGG GCC CTG CTG ACG TCA GGA ACT TCT GAC CCC CGG 
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20 



30 



55 



Thr Pro Asp Leu Arg Ala Leu Leu Thr Ser Gly Thr Ser Asp Pro Arg 
3b 40 45 

GCC CGA GTG ACT TAT GGG ACC CCC AGT CTC TGG GCC CGG TTG TCT GTT 7S1 
Ala Arg Val Thr Tyr Gly Thr Pro Ser Leu Trp Ala Arg Leu Ser Val 
50 55 €0 

GGG GTC ACT GAA CCC CGA GCA TGC CTG ACG TCT GGG ACC CCG GGT CCC 839 
Gly Val Thr Glu Pro Arg Ala Cys Leu Tnr Ser Gly Thr Pro Gly Pro 
65 70 75 

CGG GCA CAA CTG ACT GCG GTG ACC CCA GAT ACC AGG ACC CGG GAG GCC 887 
Arg Ala Gin Leu Thr Ala val Thr Pro Asp Thr Arg Thr Arg Glu Ala 
80 85 90 95 

TCA GAG AAC TCT GGA ACC CGT TCG CGC GCG TGG CTG GCG GTG GCG CTG 93b 
Ser Glu Asn Ser Gly Thr Arq ser Arg Ala Trp Leu Ala val Ala Leu 
100 105 110 

GGC OCT GGG GGG GCA GTG CTG TTG TTG TTG TGG GGC GGG GGT CGG GGT 983 
Gly Ala Gly Gly Ala Val Leu Leu Leu Leu Trp Gly Gly Gly Arg Gly 
115 120 125 

CCT CCG GCC GTC CTC GCC GCC GTC CCT AGC CCG CCG CCC GCT TCT CCC 1031 
Pro Pro Ala Val Leu Ala Ala Val Pro Ser Pro Pro Pro Ala Ser Pro 
130 135 140 

CGG AGT GAG TAC AAC TTC ATC GCA GAT GTG GTG GAG AAG ACA GCA CCT 1079 
Arg Ser Gin Tyr Asn Phe lie Ala Asp Val Val Glu Lys Thr Ala Pro 
145 150 155 

GCC GTG GTC TAT ATC GAG ATC CTG GAC CGG CAC CCT TTC TTG GGC CGC 1127 
Ala Val Val Tyr He Glu He Leu Asp Arg His Pro Phe Leu Gly Arg 
160 165 170 175 

GAG GTC CCT ATC TCG AAC GGC TCA GGA TTC GTG GTG GCT GCC GAT GGG 1175 
Glu val Pro He Ser Asn Gly Ser Gly Phe Val Val Ala Ala Asp Gly 
180 185 190 

CTC ATT GTC ACC AAC GCC CAT GTG GTG GCT GAT CGG CGC AGA GTC CGT 1223 
Leu Zle Val Thr Asn Ala His Val val Ala Asp Arg Arg Arg- Val Arg 
195 200 205 

GTG AGA CTG CTA AGC GGC GAC ACG TAT GAG GCC GTG GTC ACA GCT GTG 1271 
Val Arg Leu Leu Scr Gly Asp Thr Tyr Clu Ala val val Thr Ala val 
210 215 220 

GAT CCC GTG GCA GAC ATC GCA ACG CTG AGG ATT CAG ACT AAG GAG CCT 1319 
Asp Pro Val Ala Asp He Ala Thr Leu Arg He Gin Thr Lys Glu Pro 
225 230 235 

CTC CCC ACG CTG CCT CTG GGA CGC TCA GCT GAT GTC CGG CAA GGG GAG 1367 
Leu Pro Thr Leu Pro Leu Gly Arg Ser Ala Asp Val Arg Gin Gly Glu 
240 245 250 255 

TTT GTT GTT GCC ATG GGA AGT CCC TTT GCA CTG CAG AAC ACG ATC ACA 1415 
Phe Val Val Ala Met Gly Ser Pro Phe Ala Leu Gin Asn Thr He Thr 
260 265 270 

TCC GGC ATT GTT AGC TCT GCT CAG CGT CCA GCC AGA GAC CTG CGA CTC 1463 
Ser Gly He Val Ser Ser Ala Gin Arg Pro Ala Arg Asp Leu Gly Leu 

275 _ 280 285 

CCC CAA ACC AAT GTC GAA TAC ATT CAA ACT GAT GCA GCT ATT GAT TTT 1511 
Pro Gin Thr Asn Val Glu Tyr He Gin Thr Asp Ala Ala He Asp Phe 
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290 295 300 

GGA AAC ?CT GGA GGT CCC CTG GTT AAC CTG GAT GGG AAC GTG ATT GGA 1559 
Gly Asn Ser Giy Gly Pro Leu Val Asn Leu Asp Gly Glu Val lie Gly 
5 305 310 315 

GTG AAC ACC ATG AAG GTC ACA GOT GGA ATC TCC TTT GCC ATC CCT TCT 1607 
val Asn Thr Met Lys val Thr Ala Gly He ser Phe Ala He Pro Ser 
320 325 330 335 

10 GAT CGT CTT CGA GAG TTT CTG CAT CGT GGG GAA AAG AAG AAT TCC TCC 1655 

Asp Arg Leu Arg Glu Phe Leu His Arg Gly Glu Lys Lys Asn ser Ser 
340 345 350 

TCC GGA ATC AG? GGG TCC CAG CGG CGC TAC ATT GGG GTG ATG ATG CTG 1703 
Ser Gly He Ser Gly Ser Gin Arg Arg Tyr He Gly Val Met Met Leu 
IS 355 360 365 

ACC CTG AGT CCC AGC ATC CTT GCT GAA CTA CAG CTT CGA GAA CCA AGC 1751 
Thr Leu Ser Pro Ser He Leu Ala Glu Leu Gin Leu Arg Glu Pro Ser 
370 375 380 

20 TTT CCC GAT GTT CAG CAT GGT GTA CTC ATC CAT AAA GTC ATC CTG GGC 1799 

Phe Pro Asp Val Gin His Gly Val Leu He His Lys Val He Leu Gly 
385 390 395 

TCC CCT GCA CAC CGG GCT GGT CTG CGG CCT GGT GAT GTG ATT TTG GCC 184 7 

Ser Pro Ala His Arg Ala Gly Leu Arg Pro Gly Asp Val He Leu Ala 
25 400 405 410 415 

ATT GGG GAG CAG ATG GTA CAA AAT GCT GAA GAT GTT TAT GAA GCT GTT 1895 
Tie Gly Glu Gin Met Val Gin Asn Ala Glu Asp Val Tyr Glu Ala Val 
420 425 430 

30 CGA ACC CAA TCC CAG TTG GCA GTG CAG ATC CGG CGG GGA CGA GAA ACA 1943 

Arg Thr Gin Ser Gin Leu Ala Val Gin He Arg Arg Gly Arg Glu Thr 
435 440 445 

CTG ACC TTA TAT GTG ACC CCT GAG GTC ACA GAA TGAATAGATC ACCAAGAGTA 1996 
Leu Thr Leu Tyr Val Thr Pro Glu Val Thr Glu 
450 455 

TGAGGCTCCT GCTCTGATTT CCTCCTTGCC TTTCTGGCTG AGGTT CTG AG GGCACCGAGA 2 056 

CAGAGGGTTA AATGAACCAG TGGGGGCAGG TCCCTCCAAC CACCAGCACT GACTCCTGGG 2116 

CTCTGAAGAA T CA CAG AAA C ACTTTTTATA TAAAATAAAA TTATACCTAG CAACATAAAA 2176 

AAAAAAAAAA A 2187 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 458 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



35 



40 



45 



(ii) MOLECULE TYPE: protein 
(iit) HYPOTHETICAL i NO 
so (iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

. (-x i ) — SEQUENCE— DE SCRIPT-TON : — SEQ— I D-NO : 2 5 : 



55 



Met Ala Ala Pro Arg Ala Gly Arg Gly Ala Gly Trp Ser Leu Arg Ala 
15 10 15 



37 



EP 0 828 003 A2 



Trp Arg Ala Leu Gly Gly lie Arg Trp Gly Arg Arg Pro Arg Leu Thr 

20 25 30 

Pro Asp Leu Arg Ala Leu Leu Thr Ser Gly Thr Ser Asp Pro Arg Ala 

35 dO d5 

Arq val Thr Tyr Gly Thr Pro Ser Leu Trp Ala Arg Leu Ser Val Gly 

50 55 60 

Val Thr Glu Pro Arg Ala Cys Leu Thr Ser Gly Thr Pro Gly Pro Arg 
65 70 75 80 

Ala Gin Leu Thr Ala Val Thr Pro Asp Thr Arg Thr Arg Glu Ala Ser 

85 90 95 

Glu Asn Ser Gly Thr Arg Ser Arg Ala Trp Leu Ala Val Ala Leu Gly 

100 105 110 

Ala Gly Gly Ala Val Leu Leu Leu Leu Trp Gly Gly Gly Arg Gly Pro 

115 120 125 

Pro Ala val Leu Ala Ala Val Pro Ser Pro Pro Pro Ala Ser Pro Arg 

130 135 140 

Ser Gin Tyr Asn Phe lie Ala Asp Val Val Glu Lys Thr Ala Pro Ala 
145 150 155 160 

Val Val Tyr lie Glu He Leu Asp Arg His Pro Phe Leu Gly Arg Glu 

165 170 175 

Val Pro He Ser Asn Gly Ser Gly Phe Val Val Ala Ala Asp Gly Leu 

180 185 190 

He val Thr Asn Ala His Val Val Ala Asp Arg Arg Arg Val Arg Val 

195 200 205 

Arg Leu Leu Ser Gly Asp Thr Tyr Glu Ala Val Val Thr Ala Val Asp 

210 215 220 

Pro Val Ala Asp He Ala Thr Leu Arg He Gin Thr Lys Glu Pro Leu 
225 230 235 240 

Pro Thr Leu Pro Leu Gly Arg Ser Ala Asp val Arg Gin Gly Glu Phe 

245 250 255 

Val Val Ala Met Gly Ser Pro Phe Ala Leu Gin Asn Thr He Thr Ser 

260 255 270 

Gly He val Ser Ser Ala Gin Arg Pro Ala Arg Asp Leu Gly Leu Pro 

275 280 285 

Gin Thr Asn Val Glu Tyr He Gin Thr Asp Ala Ala He Asp Phe Gly 

290 295 300 

Asn Ser Gly Gly Pro Leu Val Asn Leu Asp Gly Glu Val He Gly Val 
305 310 315 320 

Asn Thr Met Lys Val Thr Ala Gly He Ser Phe Ala He Pro Ser Asp 

325 330 335 

Arg Leu Arg Glu Phe Leu His Arg Gly Glu Lys Lys Asn Ser Ser Ser 

340 345 350 

Gly He Ser Gly Ser Gin Arg Arg Tyr He Gly Val Met Met Leu Thr 

355 360 365 

Leu Ser Pro Ser He Leu Ala Glu Leu Gin Leu Arg Glu Pro Ser Phe 

370 375 380 

Pxo Asp val Gin His Gly Val Leu He His Lys Val He Leu Gly Ser 
385 390 395 400 

Pro Ala His Arg Ala Gly Leu Aig Pro Gly Asp Val He Leu Ala He 

405 410 415 

Gly Glu Gift Met Val Gin Asn Ala Glu Asp Val Tyr Glu Ala Val Arg 

420 425 430 

Thr Gin Ser Gin Leu Ala Val Gin He Arg Arg Cly Arg Glu Thr Leu 

435 440 445 

Thr Leu Tyr Val Thr Pro Glu Val Thr Glu 
450 455 

(2) INFORMATION FOR 3EQ ID NO: 26: 



;ij SEQUENCE CHARACTERISTICS: 
(A) LENGTH : - 2551- base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: 
(Vi) ORIGINAL SOURCE: 
(ix) FEATURE: 



(A) NAME/ KEY : Coding Sequence 

(B) LOCATION: 603... 1733 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26; 



CGTGGATCCC GAGAAAGAGG CGCAGG A CG A GGAGGCAGAA CCCGACTGGC GCGTAGAGCA 60 

GCAGCACGAG CAGTAGGAAG CAGTCACCCG GAAGCCTGGG GGCGAGAGGC GAAGTGGTCA 120 

GGCGCCGAAG GCCGAGAGCA CGCGGGG AT C GGT CTCTTCC CGCCGGGTCT CTTACCGGTG 180 

CGAGTCAAAG AGCCGCTCCG GCCCCGGCCC TGAGGGAAGC TCCATAACTG CTGCTTCAGG 24 0 

AGCGCCCGGC CGTCGCCGCC GCCGCCATTT TCGCGCCCGG CCGCAGGGGC TCTTGGGAAG 3 00 

GCGGAGTCTT TGGGCAT CCG CCCGGGGTGA GGGGACCCGA AGTCCTGAGG CGCGCCGGAA 3 60 

GGGCTAGCGG TCCCAGCATA CCCCGCGGCC CCTTGGGCCG TCTCACAACT CGCSTCCGGC 4 20 

GGAGACCACA ATT CCCGGCA TTCGTGGGGC AGGGAGGAGT CGGCCTCCCG GAATCCTGGT 4 80 

CCCGGCGTGC ACTTCTGAAG GACTTCAGGT ACCGGCGTGC CCCGCGTCCT ACTGTCCGCC 540 

TGCTCGCGTC CTGGGTGCCG CCTCTGAGTA GCCCGGGCGA GGAGGCAGCC AAGGCGGAGC 600 

TG ATG GOT GCG CCG AGG GCG GGG CGG GGT GCA GGC TGG AGC CTT CGG 647 
MeL Ala Ala Pro Arg Ala Gly Arg Gly Ala Gly Trp Ser Leu Arg 
15 10 15 

GCA TGG CGG GCT TTG GGG GGC ATT CGC TGG GGG AGG AGA CCC CGT TTG 695 
Ala Trp Arg Ala Leu 'Gly Gly lie Arg Trp Gly Arg Arg Pro Arg Leu 
20 25 30 



ACC CCT GAC CTC CGG GCC 
Thr Pro Asp T.eu Arg Ala 



GCC CGA GTG ACT TAT GGG 
Ala Arg Val Thr Tyr Gly 
50 

GGG GTC ACT GAA CCC CGA 
Gly Val Thr Glu Pro Arg 
65 



CTG CTG ACG TCA GGA ACT 
Leu Leu Thr Ser Gly Thr 
40 

ACC CCC AGT CTC TGG GCC 
Thr Pro Ser Leu Trp Ala 
55 

GCA TGC CTG ACG TCT GGG 
Ala Cys Leu Thr Ser Gly 
70 75 



TCT GAC CCC CGG 74 3 

Ser Asp Pro Arg 
45 

CGG TTG TCT GTT 791 

Arg Leu Ser val 

60 

ACC CCG GGT CCC 839 
Thr Pro Gly Pro 



CGG GCA CAA CTG ACT GCG GTG ACC CCA GAT ACC AGG ACC CGG GAG GCC 887 
Arg Ala Gin Leu Thr Ala Val Thr Pro Asp Thr Arg Thr Arg Glu Ala 
80 85 90 95 



TCA GAG AAC TCT GGA ACC CGT CCG CGC GCG TGG CTG GCG GTG GCG CTG 935 

Ser Glu Asn Ser Gly Thr Arg Ser Arg Ala Trp Leu Ala Val Ala Leu 
100 105 110 

45 

GGC GCT GGG GGG GCA GTG CTG TTG TTG TTG TGG GGC GGG GGT CGG GGT 983 

Gly Ala Gly Gly Ala Val Leu Leu Leu Leu Trp Gly Gly Gly Arg Gly 
115 120 125 

CCT CCG GCC GTC CTC GCC GCC GTC CCT AGC CCG CCG CCC GCT TCT CCC 1031 

SO Pro Pro Ala Val Leu Ala Ala Val Pro Ser Pro Pro Pro Ala Ser Pro 

130 135 140 



CGG AGT CAG TAC AAC TTC ATC GCA GAT GTG GTG GAG AAG ACA GCA CCT 107 9 

_ ~A~rg~Sel^G17nTyT™A~^ ro 

145 ISO 155 

55 

GCC GTG GTC TAT ATC GAG ATC CTG GAC CGG CAC CCT TTC TTG GGC CGC 1127 
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Ala Val Val Tyr He Glu He Leu Asp Arq His Pro Phe Leu Gly Arg 
160 165 170 175 



GAG GTC CCT ATC TCG AAC GGC TCA GGA TTC GTG GTC CCT GCC GAT GCC 
Glu Val Pro He Ser Asn Gly Ser Gly Phe Val Val Ala Ala Asp Gly 
180 1G5 190 



1175 



10 



CTC ATT GTC ACC AAC GCC CAT GTG GTG GCT GAT CGG CGC AGA GTC CGT 1223 

Leu lie Val Thr Asn Ala His Val Val Ala Asp Arg Arg Arg Val Arg 
195 200 205 

GTG AGA CTG CTA AGC GGC GAC ACG TAT GAG GCC GTG GTC AC A GCT GTG 1271 

val Arg Leu Leu Ser Gly Asp Thr Tyr Glu Ala Val Val Thr Ala Val 
210 215 220 



GAT CCC GTG GCA GAC ATC GCA ACG CTG AGG ATT CAG ACT AAG GAG CCT 
Asp Pro Val Ala Asp He Ala Thr Leu Arg He Gin Thr Lys Glu Pro 
225 230 235 



1319 



20 



CTC CCC ACG CTG CCT CTG GGA CGC TCA GCT GAT GTC CGG CAA GGG GAG 1367 
Leu Pro Thr Leu Pro Leu Gly Arg Ser Ala Asp Val Arg Gin Gly Glu 
240 245 250 255 

TTT GTT GTT GCC ATG GGA AGT CCC TTT GCA CTG CAG AAC ACG ATC ACA 1415 
Phe Val Val Ala Met Gly Ser Pro Phe Ala Leu Gin Asn Thr He Thr 
260 265 270 



25 



TCC GGC ATT GTT AGC TCT GCT CAG CGT CCA GCC AGA GAC CTG GGA CTC 
Ser Gly He Val Ser Ser Ala Gin Arg Pro Ala Arg Asp Leu Gly Leu 
275 280 285 



1463 



CCC CAA ACC AAT GTG GAA TAG ATT CAA ACT GAT GCA GCT ATT GAT TTT 
Pro Gin Thr Asn Val Glu Tyr He Gin Thr Asp Ala Ala He Asp Phe 
290 295 300 



1511 



GGA AAC TCT GGA GGT CCC CTG GTT AAC CTG GTG AGT GAG ACA TCC TTC 
Gly Asn Ser Gly Gly Pro Leu Val Asn Leu val Ser Glu Thr Ser Phe 
305 310 315 



1559 



35 



CTT CCA AGA ATC CCT GCC CCA GGT CAG TGT GGG AAG GGT AGG TTT CCC 
Leu Pro Arg He Pro Ala Pro Gly Gin Cys Gly Lys Gly Arg Phe Pro 
320 325 330 335 



1607 



40 



CTA ATT CAA GGA TGT TTG GTC AAG TTT CTG AGC AGT TCT TTG TTG GCT 1655 
Leu He Gin Gly Cys Leu Val Lys Phe Leu Ser Ser Ser Leu Leu Ala 
340 345 350 

ATC TCT CAA TAT CCA ACC AGA TCT CCC CAA CAC TTG CTG GTA CTT TTG 1703 
He Ser Gin Tyr Pro Thr Arg Ser Pro Gin His Leu Leu Val Leu Leu 
355 360 365 



45 



TTC GGG TGC CCC CAT CCC CTA CTA TTT GTT TAGGCTAGGG AACTGGGGGC TGTA 
Phe Gly Cys Pro His Pro Leu Leu Phe Val 
370 375 



1757 



SO 



55 



TCCCTGCAGG ATGGGGAGGT GATTGGAGTG AACACCATGA AGGTCACAGC TGGAATCTCC 1817 

TTTGCCATCC CTTCTGATCG TCTTCGAGAG TTTCTGCATC GTGGGGAAAA GAAGAATTCC 1877 

TCCTCCGGAA TCAGTGGGTC CCAGCGGCGC TACATTGGGG TGATGATGCT GACCCTGAGT 1937 

CCCAGCATCC TTGCTGAACT A CAGCTT CGA GAACCAAGCT TTCCCGATGT TCAGCATGGT 1997 

GTACTCATCC ATAAAGTCAT CCTGGGCTCC CCTGCACACC GGGCTGGTCT GCGGCCTGGT 2057 

GATGTGATTT TGGCCATTGG GG AG CAG ATG GTACAAAATG CTGAAGATGT TTATGAAGCT 2117 

-GT-TGGAAGGG— AATCGGAGTT-GGGAGTGCAG—ATGGGGGGGG-GACGAGAAAG—ACTGAGGTTA -2-1-7-7- 

TATGTGACCC CTGAGGTCAC AGAATGAATA GATCACCAAG AGTATGAGGC TCCTGCTCTG 22 37 

AT TT CCT CCT TGCCTTTCTC CCTCACGTTC TGAGGGCACC GAGACAGAGG GTTAAATGAA 229 7 

CCAGTGGGGG CAGGTCCCTC CAACCACCAG CACTGACTCC TGGGCTCTGA AG AAT CA CAG 2357 
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AAACACTTTT TATATAAAAT AAAATTATAC CTAGCAACAT ATT AT ACTA A AAAATG A GGT 2417 

GGGAGGGCTG GATCTTTTCC CCCACCAAAA GGCTAGAGGT AAAGCTGTAT CCCCCTAAAC 247 7 

TTAGGGGAGA TACTGGAGCT GACCATCCTG ACCTCCTATT AAAGAAAATG AGCTGCTGAA 2537 

AAAAAAAAAA AAAA 2 551 

(2) INFORMATION FOR SEQ ID WO: 27: 



<i] SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 377 amino acids 

(B) TYPE: amino acid 

10 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



<i i > MOLECULE TYPE: protein 
(iiij HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE : N- terminal 

(vi) ORIGINAL SOURCE: 



25 



30 



35 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

Met Ala Ala Pro Arg Ala Gly Arg Gly Ala Gly Trp Ser Leu Arg Ala 
20 1 S 10 15 

Trp Arg Ala Leu Gly Gly lie Arg Trp Gly Arg Arg Pro Arg Leu Thr 

20 25 30 

Pro Asp Leu Arg Ala Leu Leu Thr Ser Gly Thr Scr Asp Pro Arg Ala 

35 40 45 

Arg Val Thr Tyr Gly Thr Pro Ser Leu Trp Ala Arg Leu Ser Val Gly 

50 55 60 

Val Thr Glu Pro Arg Ala Cys Leu Thr Ser Gly Thr Pro Gly Pro Arg 
65 70 75 80 

Ala Gin Leu Thr Ala Val Thr Pro Asp Thr Arg Thr Arg Glu Ala Ser 

85 90 95 

Glu Asn Ser Gly Thr Arg Ser Arg Ala Trp Leu Ala Val Ala Leu Gly 

100 105 110 

Ala Gly Gly Ala Val Leu Leu Leu Leu Trp Gly Gly Gly Arg Gly Pro 

115 120 125 

Pro Ala Val Leu Ala Ala Val Pro Ser pro Pro Pro Ala Ser Pro Arg 

130 135 140 

Ser Gin Tyr Asn Phe He Ala Asp Val Val Glu Lys Thr Ala Pro Ala 
145 150 155 160 

val val Tyr He Glu He Leu Asp Arg His Pro Phe Leu Gly Arg Glu 

165 170 175 

Val Pro He Ser Asn Gly Ser Gly Phe val val Ala Ala Asp Gly Leu 
180 185 190 

40 He Val Thr Asn Ala His Val Val Ala Asp Arg Arg Arg Val Arg Val 

195 200 205 

Arg Leu Leu Ser Gly Asp Thr Tyr Glu Ala Val Val Thr Ala Val Asp 

210 215 220 

Pro Val Ala Asp He Ala Thr Leu Arg He Gin Thr Lys Glu Pro Leu 
225 230 235 240 

4S Pro Thr Leu Pro Leu Gly Arg Ser Ala Asp Val Arg Gin Gly Glu Fhe 

245 250 255 

val val Ala Met Gly Ser Pro Phe Ala Leu Gin Asn Thr He Thr Ser 

260 265 270 

Gly He Val Ser Ser Ala Gin Arg Pro Ala Arg Asp Leu Gly Leu Pro 
275 . 280 285 

so Gin Thr Asn Val Glu Tyr He Gin Thr Asp Ala Ala lie Asp Phe Gly 

290 255 300 

Asn Ser Gly Gly Pro Leu val Asn Leu Val Ser Glu Thr Ser Phe Leu 
305 310 315 320 

Pro Arg He Pro Ala Pro Gly Gin Cys Gly Lys Gly Arg Phe Pro Leu 
325 330 335 

55 He Gin Gly Cys Leu Val Lys Phe Leu Ser Ser Ser Leu Leu Ala He 

340 345 350 



41 



EP 0 828 003 A2 



Ser Gin Tyr Pro Thr'Arg Ser Pro Gin His Leu Leu Val Leu Leu Phe 

355 360 365 

Gly Cys Pro His pro Leu Leu Phe Val 
5 370 375 

{2) INFORMATION FOR SEQ ID NO: 28: 

U) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2144 base pairs 
w (B) TYPE: nucleic acid 

(C) STRANDED NESS: single 

(D) TOPOLOGY: linear 

<:i) MOLECULE TYPE: cDNA 
(iii; HYPOTHETICAL : NO 
75 (IV) ANT 1 SENSE : NO 

(vj FRAGMENT TYPE: 
(vi) ORIGINAL SOURCE: 
(ix) FEATURE: 

(A) NAME/ KEY : Coding Sequence 
20 (B) LOCATION: 603... 1910 

(D) OTHER INFORMATION: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 28: 

25 CGTGGATCCC GAGAAAGAGG CGCAGGACGA GGAGGCAGAA CCCGACTGGC GCGTAGAGCA 60 

GCAGCACGAG CAGTAGGAAG CAGTCACCCG GAAGCCTGGG GGCGAGAGGC GAAGTGGTCA 120 

GGCCCCGAAG GCCGAGAGCA CGCGGGGATC GGTCTCTTCC CGCCGGGTCT CTTACCGGTG 180 

CGAGTCAAAG AGCCGCTCCG GCCCCGGCCC TGAGGGAAGC TCCATAACTG CTGCTTCAGG 240 

AGCGCCCGGC CGTCGCCGCC GCCGCCATTT TCGCGCCCGG CCGCAGGGGC TCTTGGGAAG 300 

GCGGAGTCTT TGGGCATCCG CCCGGGGTGA GGGGACCCGA AGTCCTGAGG CGCGCCGGAA 360 

30 GGGCTAGCGG TCCCAGCATA CCCCGCGGCC CCTTGGGCCG TCTCACAACT CGCGTCCGGC 42 0 

GGAGACCACA AT T CCCGGCA TTCGTGGGGC AGGGAGGAGT CGGCCTCCCG GAATCCTGGT 48 C 

CCCGGCGTGC ACTTCTGAAG GACTTCAGGT ACCGGCGTGC CCCGCGTCCT ACTGTCCGCC 540 

TGCTCGCGTC CTGGGTGCCG CCTCTGAGTA GGGCGGGCGA GGAGGCAGCC AAGGCGGAGC 600 

TG ATG OCT GCG CCG AGG GCG GGG CGG GGT GCA GGC TGG AGC CTT CGG 64 7 
Met Ala Ala Pro Arg Ala Gly Arg Gly Ala Gly Trp Ser Leu Arg 

35 1 5 10 15 

GCA TGG CGG GCT TTG GGG GGC ATT CGC TGG GGG AGG AGA CCC CGT TTG 695 
Ala Trp Arg Ala Leu Gly Gly He Arg Trp Gly Arg Arg Pro Arg Leu 
20 25 30 

40 ACC CCT GAC CTC CGG GCC CTG CTG ACG TCA GGA ACT TCT GAC CCC CGG 743 

Thr Pro Asp Leu Arg Ala Leu Leu Thr Ser Gly Thr Ser Asp Pro Arg 
35 40 45 

GCC CGA GTG ACT TAT GGG ACC CCC AGT CTC TGG GCC CGG TTG TCT GTT 791 
Ala Arg Val Thr Tyr Gly Thr Pro Ser Leu Trp Ala Arg Leu Ser val 
50 55 60 



45 



GGG GTC ACT GAA CCC CGA GCA TGC CTG ACG TCT GGG ACC CCG GGT CCC 839 

Gly Val Thr Glu Pro Arg Ala Cys Leu Thr Ser Gly Thr Pro Gly Pre 
65 70 75 

50 

CGG GCA CAA CTG ACT GCG GTG ACC CCA GAT ACC AGG ACC CGG GAG GCC 887 

Arg Ala Gin Leu Thr Ala Val Thr Pro Asp Thr Arg Thr Arg Glu Ala 
80 85 90 95 

TCA GAG AAC TCT GGA ACC CGT TCG CGC GCG TGG CTG GCG GTG GCG CTG 935 
SS Ser Glu Asn Ser Gly Thr Arg Ser Arg Ala Trp Leu Ala Val Ala Leu 

100 10b 110 
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GGC GCT GGG GGG GCA GTG CTG TTG TTG TTG TGG GGC GGG GGT CGG GGT 
Gly Ala Gly Gly Ala Val Leu Leu Lea Lqj Trp Gly Gly Gly Arg Gly 
115 120 125 

CCT CCG GCC GTC. CTC GCC GCC GTC CCT AGC CCG CCG CCC GCT TCT CCC 
Pro Pro Ala Val Leu Ala Ala Val Pro Ser Pro Pro Pro Ala Ser Pro 
130 135 140 

CGG AGT CAC TAC AAC TTC ATC GCA GAT GTG GTG GAG AAG ACA GCA CCT 
Arg Ser Gin Tyr Asn Phe lie Ala Asp Val Val Glu Lys Thr Ala Pro 
145 150 155 

GCC GTG GTC TAT ATC GAG ATC CTG GAC CGG CAC CCT TTC TTG GGC CGC 
Ala Val Val Tyr lie Glu lie Leu Asp Arg His Pro Phe Leu Gly Arg 
160 165 170 175 

GAG GTC CCT ATC TCG AAC GGC TCA GGA TTC GTG GTG GCT GCC GAT GGG 
Glu Val Pro lie Ser Asn Gly Ser Gly Phe Val Val Ala Ala Asp Gly 
180 185 ISO 

CTC ATT GTC ACC AAC GCC CAT GTG GTG GCT GAT CGG CGC AG A GTC CGT 
Leu He val Thr Asn Ala His Val Val Ala Asp Arg Arg Arg Val Arg 
195 200 205 

GTG AGA CTG CTA AGC GGC GAC ACG TAT GAG GCC GTG GTC ACA GCT GTG 
Val Arg Leu Leu Ser Gly Asp Thr Tyr Glu Ala Val Val Thr Ala Val 
210 215 220 

GAT CCC GTG GCA GAC ATC GCA ACG CTG AGG ATT CAG ACT AAG GAG CCT 
Asp Pro Val Ala Asp He Ala Thr Leu Arg He Gin Thr Lys Glu Pro 
225 230 235 

CTC CCC ACG CTG CCT CTG GGA CGC TCA GCT GAT GTC CGG CAA GGG GAG 
Leu Pro Thr Leu Pro Leu Gly Arg Ser Ala Asp Val Arg Gin Gly Glu 
240 245 250 255 

TTT GTT GTT GCC ATG GGA AGT CCC TTT GCA CTG CAG AAC ACG ATC ACA 

Phe Val Val Ala Met Gly Ser Pro Phe Ala Leu Gin Asn Thr He Thr 
260 265 270 

TCC GGC ATT GTT AGC TCT GCT CAG CGT CCA GCC AGA GAC CTG GGA CTC 
Ser Gly He Val Ser Ser Ala Gin Arg Pro Ala Arg Asp Leu Gly Leu 
275 280 285 

CCC CAA ACC AAT GTG GAA TAC ATT CAA ACT GAT GCA GCT ATT GAT TTT 
Pro Gin Thr Asn Val Glu Tyr He Gin Thr Asp Ala Ala lie Asp Phe 
290 295 300 

GGA AAC TCT GGA GGT CCC CTG GTT AAC CTG GCT AGG GAA CTG GGG GCT 
Gly Asn Ser Gly Gly Pro Leu Val Asn Leu Ala Arg Glu Leu Gly Ala 
305 310 315 

GTA TCC CTG CAG GAT GGG GAG GTG ATT GGA GTG AAC ACC ATG AAG GTC 
Val Ser Leu Gin Asp Gly Glu Val He Gly val Asn Thr Met Ly3 Val 
320 325 330 335 

ACA GCT GGA ATC TCC TTT GCC ATC CCT TCT GAT CGT CTT CGA GAG TTT 
Thr Ala Gly He Ser Phe Ala He Pro Ser Asp Arg Leu Arg Glu Phe 
340 345 350 

CTG CAT CGT GGG GAA AAG AAG AAT TCC TCC TCC GGA ATC AGT GGG TCC 
Leu His Arg Gly Glu Lys Lys Asn Ser Ser Ser Gly He Ser Gly Ser 
355 360 365 

CAG CGG CGC TAC ATT GGG GTG ATG ATG CTG ACC CTG AGT CCC AGG GCT 
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Gin Arg Arg Tyr He Gly Val Met Met Leu Thr Leu Ser Pro Arg Ala 
J70 375 380 

GGT CTG CGG CCT GGT GAT GTG ATT TTG GCC ATT GCG GAG CAG ATG GTA 1799 
5 Gly Leu Arg Pro Gly Asp val lie Leu Ala lie Gly Glu Gin Met val 

385 390 395 

CAA AAT GCT GAA GAT GTT TAT GAA GCT GTT CGA ACC CAA TCC CAG TTG 184 7 

Gin Asr. Ala Glu Asp Val Tyr Glu Ala Val Arg Thr Gin Ser Gin Leu 
400 405 410 415 

10 

GCA GTG CAG ATC CGG CGG GGA CGA GAA ACA CTG ACC TTA TAT GTG ACC 1895 

Ala Val Gin He Arg Arg Gly Arg Glu Thr Leu Thr Leu Tyr Val Thr 
420 425 430 

CCT GAG GTC ACA GAA TG AAT AG AT C ACCAAGAGTA TGAGGCTCCT GCTCTGATTT CC 1952 
is Pro Glu Val Thr Glu 

435 

TCCTTGCCTT TCTGGCTGAG GTTCTGAGGG CACCGAGACA GAGGGTTAAA TGAACCAGTG 2 012 

GGGGCAGGTC CCTCCAACCA CCAGCACTGA CTCCTGGGCT CTGAAGAATC ACAGAAACAC 2 072 

TTTTTATATA AAATAAAATT ATACCTAGCA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 2132 

20 AAAAAAAAAA AA 2144 

(2) INFORMATION FOR SEQ ID MO: 29; 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 436 amino acids 

25 (B) type: amino acid 

(CJ STRANDEDWESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 
Uii) HYPOTHETICAL: NO 

30 (ivj ANT I SENSE : NO 

(v) FRAGMENT TYPE: internal 
(vij ORIGINAL SOURCE! : 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 



45 



SO 



55 



Met 


Ala 


Ala 


Pro 


Arg 


Ala 


Gly 


Arg 


Gly 


Ala 


Gly 


Trp 


Ser 


Leu 


Arg 


Ala 


1 








5 










10 










15 




Trp 


Arg 


Ala 


Leu 


Gly 


Gly 


He 


Arg 


Trp 


Gly 


Arg 


Arg 


Pro 


Arg 


Leu 


Thr 








20 










25 










30 






Pro 


Asp 


Leu 


Arg 


Ala 


Leu 


Leu 


Thr 


Ser 


Gly 


Thr 


Ser 


Asp 


Pro 


Arg 


Ala 






35 










40 










45 








Arg 


val 


Thr 


Tyr 


Gly 


Thr 


Pro 


Ser 


Leu 


Trp 


Ala 


Arg 


Leu 


Ser 


val 


Gly 




50 










55 










60 










Val 


Thr 


Glu 


Pro 


Arg 


Ala 


Cys 


Leu 


Thr 


Ser 


Gly 


Thr 


Pro 


Gly 


Pro 


Arg 


65 










70 










75 










80 


Ala 


Gin 


Leu 


Thr 


Ala 


val 


Thr 


Pro 


Asp 


Thr 


Arg 


Thr 


Arg 


Glu 


Ala 


ser 










85 










90 










95 




Glu 


Asn 


Ser 


Gly 


Thr 


Arg 


Ser 


Arg 


Ala 


Trp 


Leu 


Ala 


Val 


Ala 


Leu 


Gly 








100 










105 










110 






Ala 


Gly 


Gly 


Ala 


val 


Leu 


Leu 


Leu 


Leu 


Trp 


Gly 


Gly Gly 


Arg 


Gly 


Pro 






115 










120 










125 








Pro 


Ala 


Val 


Leu 


Ala 


Ala 


Val 


Pro 


Ser 


Pro 


Pro 


Pro 


Ala 


Ser 


Pro 


Arg 




130 










135 










140 










Ser 


Gin 


Tyr 


Asn 


Phe 


He 


Ala 


Asp 


Val 


Val 


Glu 


Lys 


Thr 


Ala 


Pro 


Ala 


145 










150 










155 










160 


Val 


Val 


Tyr 


lie 


Glu 


lie 


Leu 


Asp 


Arg 


His 


Pro 


Phe 


Leu 


Gly 


Arg 


Glu 










165 










170 










175 




val 


Pro 


He 


Ser 


Asn 


Gly 


Ser 


Gly 


Phe 


Val 


Val 


Ala 


Ala 


Asp 


Gly 


Leu 








180 










185 










190 






He 


Val 


Thr 


Asn 


Ala 


His 


val 


val 


Ala 


Asp 


Arg 


Arg 


Arg 


val 


Arg 


Val 
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195 200 205 

Arg Leu Leu Ser Gly Asp Thr Tyr Glu Ala Val val Thr Ala Val Asp 

210 215 220 

Pro val Ala Asp lie Ala Thr Leu Arg He Gin Thr Lys Glu Pro Leu 
S 225 230 235 240 

Pro Thr Leu Pro Leu Gly Arg Ser Ala Asp Val Arg Gin Gly Glu Phe 

245 250 255 

val Val Ala Met Gly Ser Pro Phe Ala Leu Gin Asn Thr He Thr ser 

260 265 270 

Gly Tie Val Ser Ser Ala Gin Arg Pro Ala Arg Asp Leu Gly Leu Pro 
™ 275 280 285 

Gin Thr Asn val Glu Tyr He Gin Thr Asp Ala Ala He Asp Phe Gly 

290 295 300 

Asn Ser Gly Gly Pro Leu Val Asn Leu Ala Arg Glu Leu Gly Ala Val 
305 310 315 320 

Ser Leu Gin Asp Gly Glu Val He Gly Val Asn Thr Met Lys Val Thr 
16 325 330 335 

Ala Gly He Ser Phe Ala He Pro Ser Asp Arg Leu Arg Glu Phe Leu 

340 345 350 

His Arg Gly Glu Lys Lys Asn Ser Ser Ser Gly He Ser Gly Ser Gin 

355 360 365 

Arg Arg Tyr He Gly Val Met Met Leu Thr Leu Ser Pro Arg Ala Gly 

370 375 380 

Leu Arg Pro Gly Asp Val He Leu Ala He Gly Glu Gin Met val Gin 
385 390 395 400 

Asn Ala Glu Asp val Tyr Glu Ala Val Arg Thr Gin Ser Gin Leu Ala 

405 410 415 

Val Gin He Arg Arg Gly Arg Glu Thr Leu Thr Leu Tyr Val Thr Pro 

420 425 430 

Glu val Thr Glu 
435 

{2} INFORMATION FOR SEQ ID NO:30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2187 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : Single 

(D) TOPOLOGY: linear 



20 



25 



(ii) MOLECULE TYPE: CDNA 
(ill) HYPOTHETICAL: NO 

(iv) ANT I SENSE: KO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE; 

40 (ix) feature: Polymorphic variants at 672 and 1435 

aa24=Arg/Cys aa27 8=Ala/Val 

(A) NAME /KEY : Coding Sequence 

(B) LOCATION: 603... 1976 
(D) OTHER INFORMATION: 



45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 



CGTGGATCCC GAGAAAGAGG CGCAGGACGA GGAGGCAGAA OCCGACTGGC GCGTAGAGCA 60 

GCAGCACGAG CAGTAGGAAG CAGTCACCCG GAAGCCTGGG GGCGAGAGGC GAAGTGGTCA 12 0 

SO GGCGCCGAAG GCCGAGAGCA CGCGGGGATC GGTCTCTTCC CGCCGGGTCT CTTACCGGTG 18 0 

CGAGTCAAAG AGCCGCTCCG GCCCCGGCCC TGAGGGAAGC TCCATAACTG CTGCTTCAGG 24 0 

AGCGCCCGGC CGTCGCCGCC GCCGCCATTT TCGCGCCCGG CCGCAGGGGC TCTTGGGAAG 300 

GCGGAGTCTT TGGGCATCCG CCCGGGGTGA GGGGACCCGA AGTCCTGAGG CGCGCCGGAA 360 

GGGCTAGCGG TCeCAGCATA-CeCCGCGGCC CCTTGGGCCG TCTCACAACT CGCGTCCGGC 42 0 

GGAGACCACA ATTCCCGGCA TTCGTGGGGC AGGGAGGAGT CGGCCTCCCG GAATCCTGGT 480 

55 CCCGGCGTGC ACTTCTGAAG GACTTCAGGT ACCGGCGTGC CCCGCGTCCT ACTGTCCGCC 54 0 

TGCTCGCGTC CTGGCTCCCC CCTCTGACTA GGGCGGGCGA GGAGGCAGCC AAGGCGGAGC 600 
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TG ATG GCT GCG CCG AGG GOG GGG CGG GGT GCA GGC TGG AGC CTT CGG 647 
Met Ala Ala Pro Arg Ala Gly Arg Gly Ala Gly Trp Ser Leu Arg 
15 10 15 

5 

GCA TGG CGG GCT TTG GGG GGC ATT YGC TGG GGG AGG AGA CCC CGT TTG 695 
Ala Trp Arg Ala Leu Gly Gly lie Xaa Trp Gly Arg Arg Pro Arg Leu 
20 25 30 

ACC CCT GAC CTC CGG GCC CTG CTG ACG TCA GGA ACT TCT GAC CCC CGG 743 
10 T hr Pro Asp Leu Arg Ala Leu Leu Thr Ser Gly Thr Ser Asp Pro Arg 

35 40 45 

GCC CGA GTG ACT TAT GGG ACC CCC AGT CTC TGG GCC CGG TTG TCT GTT 791 

Ala Arg Val Thr Tyr Gly Thr Pro Ser Leu Trp Ala Arg Leu Ser Val 

50 55 60 

75 

GGG GTC ACT GAA CCC CGA GCA TGC CTG ACG TCT GGG ACC CCG GGT CCC 839 

Gly Val Thr Glu Pro Arg Ala Cys Leu Thr Ser Gly Thr Pro Gly Pro 
65 70 75 

CGG GCA CAA CTG ACT GCG GTG ACC CCA GAT ACC AGG ACC CGG GAG GCC 887 
20 Ar 9 Ala Gin Leu Thr Ala Val Thr Pro Asp Thr Arg Thr Arg Glu Ala 

80 85 90 95 

TCA GAG AAC TCT GGA ACC CGT TCG CGC GCG TGG CTG GCG GTG GCG CTG 935 

Ser Glu Asn ser Gly Thr Arg ser Arg Ala Trp Leu Ala Val Ala Leu 

100 105 110 

25 

GGC GCT GGG GGG GCA GTG CTG TTG TTG TTG TGG GGC GGG GGT CGG GGT 983 

Gly Ala Gly Gly Ala Val Leu Leu Leu Leu Trp Gly Gly Gly Arg Gly 
115 120 125 

CCT CCG GCC GTC CTC GCC GCC GTC CCT AGC CCG CCG CCC GCT TCT CCC 1031 
30 p ro Pro Ala Val Leu Ala Ala Val Pro Ser Pro Pro Pro Ala Ser Pro 

130 135 140 

CGG AGT CAG TAC AAC TTC ATC GCA GAT GTG GTG GAG AAG ACA GCA CCT 10 79 

Arg Ser Gin Tyr Asn Phe lie Ala Asp Val Val Glu Lys Thr Ala Pro 

145 150 155 

3S 

GCC GTG GTC TAT ATC GAG ATC CTG GAC CGG CAC CCT TTC TTG GGC CGC 1127 

Ala Val Val Tyr lie Glu lie Leu Asp Arg His Pro Phe Leu Gly Arg 

160 165 170 175 

GAG GTC CCT ATC TCG AAC GGC TCA GGA TTC GTG GTG GCT GCC GAT GGG 1175 
40 Glu val Pro lie Ser Asn Gly Ser Gly Phe val val Ala Ala Asp Gly 

180 185 190 

CTC ATT GTC ACC AAC GCC CAT GTG GTG GCT GAT CGG CGC AGA GTC CGT 1223 

Leu He val Thr Asn Ala His Val val Ala Asp Arg Arg Arg val Arg 
195 200 205 

45 

GTG AGA CTG CTA AGC GGC GAC ACG TAT GAG GCC GTG GTC ACA GCT GTG 1271 

val Arg Leu Leu Ser Gly Asp Thr Tyr Glu Ala val val Thr Ala val 
210 215 220 

CAT CCC GTG GCA GAC ATC GCA ACG CTG AGG ATT CAG ACT AAG GAG CCT 1319 
SO Asp Pro Val Ala Asp lie Ala Thr Leu Arg lie Gin Thr Lys Glu Pro 

225 230 235 

CTC CCC ACG CTG CCT CTG GGA CGC TCA GCT GAT GTC CGG CAA GGG GAG 1367 
Leu Pro Thr Leu Pro Leu Gly Arg Ser Ala Asp Val Arg Gin Gly Glu 
240 245 250 255 



55 



TTT GTT GTT GCC ATG GGA AGT CCC TTT GCA CTG CAG AAC ACG ATC ACA 1415 
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Phe Val Val Ala Met Gly Ser Pro Phe Ala Leu Gin Asn Thr lie Thr 
260 265 270 



TCC GGC. ATT GTT AGC TCT YCT CAG CGT CCA GCC AGA GAC CTG GGA CTC 
Ser Gly lie val Ser Ser Xaa Gin Arg Pro Ala Arg Asp Leu Gly Leu 
275 280 285 



1463 



10 



CCC CAA ACC AAT GTG GAA TAC ATT CAA ACT GAT GCA GCT ATT GAT TTT 1511 

pro Gin Thr Asn val Glu Tyr lie Gin Thr Asp Ala Ala Tie Asp Phe 
290 295 300 

GGA AAC TCT GGA GGT CCC CTG GTT AAC CTG GAT GGG GAG GTG ATT GGA 1559 

Gly Asn Ser Gly Gly Pro Leu Val Asn Leu Asp Gly Glu val He Gly 
305 310 315 



IS 



20 



GTG AAC ACC ATG AAG GTC ACA GCT GGA ATC TCC TTT GCC ATC CCT TCT 
val Asn Thr Met Lys Val Thr Ala Gly He Ser Phe Ala He Pre Ser 
320 325 330 335 



1607 



GAT CGT CTT CGA GAG TTT CTG CAT CGT GGG GAA AAG AAC AAT TCC TCC 1655 
Asp Arg Leu Arg Glu Phe Leu His Arg Gly Glu Lys Lys Asn Ser Ser 
340 345 350 

TCC GGA ATC AGT GGG TCC CAG CGG CGC TAC ATT GGG GTG ATG ATG CTG 1703 
Ser Gly He Ser Gly Ser Gin Arg Arg Tyr lie Gly Val Met Met Leu 
355 360 36b 



25 



30 



35 



ACC CTG AGT CCC AGC ATC CTT GCT GAA CTA CAG CTT CGA GAA CCA AGC 
Thr Leu Ser Pro Ser He Leu Ala Glu Leu Gin Leu Arg Glu Pre Ser 
370 375 380 



1751 



TTT CCC GAT GTT CAG CAT GGT GTA CTC ATC CAT AAA GTC ATC CTG GGC 179 9 

?he Pro Asp val Gin His Gly Val Leu He His Lys Val He Leu Gly 
385 390 395 

TCC CCT GCA CAC CGG GCT GGT CTG CGG CCT GGT GAT GTG ATT TTG GCC 1847 
Ser Pro Ala His Arg Ala Gly Leu Arg Pro Gly Asp Val He Leu Ala 
400 405 410 415 

ATT GGG GAG CAG ATG GTA CAA AAT GCT GAA GAT GTT TAT GAA GCT GTT 1895 
lie Gly Glu Gin Met Val Gin Asn Ala Glu Asp Val Tyr Glu Ala Val 
420 425 430 

CGA ACC CAA TCC CAG TTG GCA GTG CAG ATC CGG CGG GGA CGA GAA ACA 1943 
Arg Thr Gin Ser Gin Leu Ala Val Gin He Arg Arg Gly Arg Glu Thr 
435 440 445 



CTG ACC TTA TAT GTG ACC CCT GAG GTC ACA GAA TGAATAGATC A CCAAG AGTA 
Leu Thr Leu Tyr Val Thr Pro Glu Val Thr Glu 
450 455 



1996 



45 



TGAGGCTCCT GCTCTGATTT CCTCCTTGCC TTT CTGG CTG AGGTTCTGAG GGCA C CGAGA 
CAGAGGGTTA AAT GAA C CAG TGGGG3CAGG TCCCTCCAAC CACCAGCACT GACTCCTGGG 
CTCTGAAGAA T CACAG AAAC ACTTTTTATA T A A AAT A AAA TT AT ACCT AG CAACATAAAA 
AAAAAAAAAA A 

(2) INFORMATION FOR SEQ ID NC:31: 

ti) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 458 amino acids 

(B) TYPE: amino acid 

(C) STR AND EDMESS: single 

(D) TOPOLOGY: linear 



2056 
2116 
2176 
2187 



55 



(ii) MOLECULE TYPE: protein 
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(iiij HYPOTHETICAL: NO 
<iv) ANT I SENSE : NO 
(v) FRAGMENT TYPE: internal 
(vij ORIGINAL SOURCE: 

Feature 

24 Xaa = Arg or Cys 
7.78 Xaa = Ala or val 

(xij SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

Met Ala Ala Pro Arg Ala Gly Arg Gly Ala Gly Trp Ser Leu Arg Ala 

1 5 10 15 

Trp Arg Ala Leu Gly Gly lie Xaa Trp Gly Arg Arg Pro Arg Leu Thr 

20 25 30 

Pro Asp Leu Arg Ala Leu Leu Thr Ser Gly Thr Ser Asp Pro Arg Ala 

35 40 45 

Arg val Thr Tyr Gly Thr Pro Ser Leu Trp Ala Arg Leu Ser Val Gly 

50 55 60 

val Thr Glu Pro Arg Ala Cys Leu Thr Ser Gly Thr Pro Gly Pro Arg 
65 70 75 80 

Ala Gin Leu Thr Ala Val Thr Pro Aep Thr Arg Thr Arg Glu Ala Ser 

95 90 95 

Glu Asn Ser Gly Thr Arg Ser Arg Ala Trp Leu Ala Val Ala Leu Gly 

100 105 110 

Ala Gly Gly Ala Val Leu Leu Leu Leu Trp Gly Gly Gly Arg Gly Pro 

115 120 125 

Pro Ala Val Leu Ala Ala Val Pro Ser Pro Pro Pro Ala Ser Pro Arg 

130 lf . 135 140 

Ser Gin Tyr Asn Phe lie Ala Asp Val Val Glu Lys Thr Ala Pro Ala 
145 150 155 160 

Val Val Tyr lie Glu He Leu Asp Arg His Pro Phe Leu Gly Arg Glu 

165 170 175 

val Pro He Ser Asn Gly Ser Gly Phe Val Val Ala Ala Asp Gly Leu 

18 0 185 19 0 

He Val Thr Asn Ala His Val Val Ala Asp Arg Arg Arg Val Arg Val 

195 200 205 

Arg Leu Leu Ser Gly Asp Thr Tyr Glu Ala Val Val Thr Ala Val Asp 

210 215 220 

Fro Val Ala Asp He Ala Thr Leu Arg He Gin Thr Lys Glu Pro Leu 
225 230 235 240 

Pro Thr Leu Pro Leu Gly Arg Ser Ala Asp val Arg Gin Gly Glu Phe 

245 250 255 

Val Val Ala Met Gly Ser Pro Phe Ala Leu Gin Asn Thr He Thr Ser 

260 265 270 

Gly He Val Ser Ser Xaa Gin Arg Pro Ala Arg Asp Leu Gly Leu Pro 

275 280 285 

Gin Thr Asn Val Clu Tyr Tie Gin Thr Asp Ala Ala lie Asp Phe Gly 

290 295 300 

Asn Ser Gly Gly Pro Leu Val Asn Leu Asp Gly Glu Val He Gly val 
305 310 315 320 

Asn Thr Met Lys Val Thr Ala Gly He Ser Phe Ala He Pro Ser Asp 

325 330 335 

Arg Leu Arg Glu Phe Leu His Arg Gly Glu Lys Lys Asn Ser Ser Ser 

340 345 350 

Gly He Ser Gly Ser Gin Arg Arg Tyr He Gly Val Met Met Leu Thr 

355 360 365 

Leu Ser Pro Ser He Leu Ala Glu Leu Gin Leu Arg Glu Pro Ser Phe 

370 375 380 

Pro Asp val Gin Hie Gly Val Leu He His Lys val He Leu Gly ser 
385 390 395 400 

Pro Ala His Arg Ala Gly Leu Arg Pro GJy Asp Val He Leu Ala He 

405 410 415 

Gly Glu Gin Met Val Gin Asn Ala Glu Asp Val Tyr Glu Ala Val Arg 
420 425 430 
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Thr Gin Ser Gin Leu Ala Val Gin Tie Arg Arg Gly Arg Glu Thr 

43b 440 445 

Thr Leu Tyr Val Thr Pro Glu Val Thr Glu 
450 455 

(2) INFORMATION FOR SEQ ID NO: 32: 

ti} SEQUENCE CHARACTERISTICS: 
(A) length: 22 base pairs 
<B) TYPE : nucleic acid 

(C) ST RAND ED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) PRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
CAT CCGG CAT TGTTAGCTCT GC 22 
(2) INFORMATION FOR SEQ ID NO:33: 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
CATCCGGCAT TGTTAGCTCT GT 22 
(2) INFORMATION FOR SEQ ID NO: 34: 

ii) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY i linear 

(ii) MOLECULE TYPE: DNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

Ui) SEQUENCE DESCRIPTION: SEQ ID NO:3d: 
CAATAGCTGC ATCAGTTTGA ATG 23 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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Id) topology : 1 inear 

(ii) MOLECULE TYPE: DMA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE; NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SH1Q ID NO: 35 
TGGCGGGCTT TGGGGGGCAT TC 22 
(2) INFORMATION FOR SEO ID NO: 36: 

Ii) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE; nucleic acid 

(C) ST HANDEDNESS : single 

(D) TOPOLOGY: linear 

Iii) MOLECULE TYPE: DNA 
:iii) HYPOTHETICAL: NO 
(iv) ANTISENSE: NO 
■v) FRAGMENT TYPE: 
(vi) ORIGINAL SOURCE: 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
TGGCGGGCTT TGGGGGGCAT TT 22 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

{Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
GACGTCAGCA GGGCCCGGAG GTC 23 
(2) INFORMATION KOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) type: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOFOLCGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
GATACCCCAG CAGAAGCTGG 20 
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(2) INFORMATION FOR SEQ ID NO: 39: 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



10 



(ii) MOLECULE TYPE: DNA 
{iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

1S (vi) ORIGINAL SOURCE: 

{xi) SEQUENCE DESCRIPTION: SBQ ID NO: 39; 
GATACCCCAG CAGAAGCTGT 20 

20 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NO 
30 (iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

3S 

GCTGACATCA TTGGCGGAGA C 21 



40 Claims 

1. An isolated polynucleotide encoding a biologically active PSP1 polypeptide. 

2. An isolated polynucleotide selected from the group consisting of: 

45 

(a) a polynucleotide encoding PSP1-1 having the nucleotide sequence as set forth in SEQ ID NO: 24 from 
nucleotide 603 to 1 979; 

(b) a polynucleotide encoding PSP1-2 having the nucleotide sequence as set forth in SEQ ID NO: 23 from 
nucleotide 603 to 1 979; 

so (c) a polynucleotide encoding PSP1 -3 having the nucleotide sequence as set forth in SEQ ID NO: 26 from 

nucleotide 603 to 1736; 

(d) a polynucleotide encoding PSP1-4 having the nucleotide sequence as set forth in SEQ ID NO: 28 from 
nucleotide 603 to 1 91 3; and 

(e) a polynucleotide encoding D87257 (1325T) protein. 

55 

3. An isolated polynucleotide substantially similar to SEQ ID NO: 24; SEQ ID NO: 23; SEQ ID NO: 26; or SEQ ID 
NO: 28. 
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4. An iso tated polynucleotide as claimed in claim 2 or 3 wherein nucleotides 672 and 1 435 are independently selected 
from C and T. 

5. An isolated polynucleotide having the nucleotide sequence as set forth in SEQ ID NOs: 23, 24, 26, 28, or 30 or 
SEQ ID NO: 17 wherein nucleotide 1325 is T. 

6. A functional polypeptide encoded by the polynucleotide of any one of claims 1 to 5. 

7. The functional polypeptide of claim 6 which is: 

PSP1-1 having the amino acid sequence set forth in SEQ ID NO: 25 or 30; 
PSP1-2 having the amino acid sequence set forth in SEQ ID NO: 8; 
PSP1-3 having the amino acid sequence set forth in SEQ ID NO: 27; 
PSP1-4 having the amino acid sequence set forth in SEQ ID NO: 29; or 

D87257 (1325T) protein having the amino acid sequence set forth in SEQ ID NO: 18 wherein amino acid 
residue 213 is val. 

8. The polynucleotide as claimed in of any one of claims 1 to 5 which is DNA or RNA. 

9. A vector comprising the DNA of claim 8. 

10. A recombinant host cell comprising the vector of claim 9. 

11. A method for preparing essentially pure PSP1 protein or D87257 (1325T) protein comprising culturing the recom- 
binant host cell of claim 10 under conditions promoting expression of the protein and recovering the expressed 
protein. 

12. PSP1 or D87257 (1 325T) produced by the process of claim 11 . 

13. An antisense oligonucleotide comprising a sequence which is capable of binding to the polynucleotide of any one 
of claims 1 to 5 or D87258. 

14. A modulator of the polypeptide of claim 6 or of D87258. 

15. A method for assaying a medium for the presence of a substance that modulates PSP1 or D87258 activity: 

(i) by affecting the binding of PSP1 or D87258 to cellular binding partners comprising the steps of: 

(a) providing a PSP1 polypeptide of claim 8 or D8725S protein, or a functional derivative thereof and a 
cellular binding partner or synthetic analog thereof; 

(b) incubating with a test substance which is suspected of modulating PSP1 or D87258 activity under 
conditions which permit the formation of a PSP1 or D87258 protein/cellular binding partner complex; 

(c) assaying for the presence of the complex, free PSP1 or D87258 protein or free cellular binding partner; 
and 

(d) comparing to a control to determine the effect of the substance; 

(ii) by inhibiting proteolytic activity on a cellular substrate comprising the steps of: 

(a) providing a PSP1 polypeptide of claim 8 or D87258, or a functional derivative thereof and a cellular 
substrate or synthetic analog thereof; 

(b) incubating with a test substance which is suspected of inhibiting PSP1 or D87258 activity under con- 
ditions which permit the formation of a PSP1 or D87258 enzyme/substrate complex and subsequent cleav- 
age of the substrate; 

(c) assaying for the presence of proteolytically cleaved substrate; and 

(d) comparing to a control to determine the effect of the substance. 

(iii) by direct binding to PSP1 or D87258 protein comprising the steps of: 
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(a) providing a labelled PSP1 polypeptide of claim 8 or D872S8, or a functional derivative thereof; 

(b) providing solid support-associated modulator candidates; 

(c) incubating a mixture of the labelled PSP1 or D87258 protein with the support-associated modulator 
candidates under conditions which can permit the formation of a PSP1 or D87258 protein/modulator can- 
didate complex; 

(d) separating the solid support from free soluble labelled PSP1 or D87258 protein; 

(e) assaying for the presence of solid support-associated labelled protein; 

(f) isolating the solid support complexed with labelled PSP1 or D87258 protein; and 

(g) identifying the modulator candidate. 

16. PSP1 or D87258 protein modulating compounds identified by the method of claim 15. 

17. The use of a modulating compound of claim 1 6 in the manufacture of a medicament for treating of a patient having 
need to modulate PSP1 or D87258 activity. 

18. A pharmaceutical composition comprising the modulating compound of claim 16 and a pharmaceutical^ accept- 
able carrier. 

19. A method of diagnosing conditions associated with PSP1 or D87258 protein deficiency which comprises: 

(a) isolating a polynucleotide sample from an individual; 

(b) assaying the polynucleotide sample and a polynucleotide encoding PSP1 having the nucleotide sequence 
as set forth in SEQ ID NOs: 23, 24, 26, 28 or 30 or a polynucleotide encoding D87258 as set forth in SEQ ID 
NO: 18; and 

(c) comparing differences between the polynucleotide sample and the PSP1 or D87258 polynucleotide, where- 
in any differences indicate mutations in the PSP1 or D87258 sequence. 

20. A method of treating conditions which are related to insufficient PSP1 or D87258 protein function which comprises: 

(i) the steps of: 

(a) isolating cells from a patient deficient in PSPI or D87258 protein function; 

(b) altering the cells by transfecting the polynucleotide of any one of claims 1 to 7, or a polynucleotide 
encoding D87258 as set forth in SEQ ID NO: 18 into the cells wherein a PSP1 or D87258 protein is 
expressed; and 

(c) introducing the cells back to the patient to alleviate the condition; or (ii) administering the polynucleotide 
of any one of claims 1 to 5, or a polynucleotide encoding D87258 as set forth in SEQ ID NO: 18, to a 
patient deficient in PSP1 or DB7258 protein function wherein a PSP1 or D87258 protein is expressed and 
alleviates the condition. 

21. An antibody immunoreactive with PSPI, D87258 or an immunogen thereof. 

22. A transgenic non-human animal capable of expressing in any cell thereof the DNA of claim 5 or a polynucleotide 
encoding D87258 as set forth in SEQ ID NO: 18. 

23. A method for determining the genetic predisposition to neurodegeneration in a patient comprising detecting PSP1 
or D87258 polymorphisms in a sample from a patient, preferably neurodegeneration predisposition to Alzheimer's 
disease. 

24. The method of claim 23 wherein the polymorphisms detected are at nucleotide 672 of PSP1, at nucleotide 1435 
of PSP1 or at nucleotide 1 325 of D87258. 

25. The method of claim 24 wherein the polymorphisms are detected by polymerase chain reaction, preferably wherein 
the oligonucleotides used with the polymerase chain reaction have a nucleotide sequence selected from the group 
consisting of SEQ ID NOs: 32, 33, 34, 35, 36, 37, 38, 39, or 40. 

26. An isolated polynucleotide having the nucleotide sequence as set forth in SEQ ID NO: 32, 33, 34, 35, 36 ,37, 38, 
39, or 40. 



63 



EP 0 828 003 A2 



27. An oligonucleotide pair comprising oligonucleotides having the nucleotide sequence as set forth in: 

(a) SEQ ID NOs: 32 and 34; 

(b) SEQ ID NOs: 33 and 34; 

(c) SEQ ID NOs: 35 and 37; 

(d) SEQ ID NOs: 36 and 37; 

(e) SEQ ID NOs: 38 and 40; or 

(f) SEQ ID NOs: 39 and 40. 



10 



15 



so 



25 



35 



40 



45 



50 
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41 SGVSDPRARVTYGTPSLWARLSVGVTEPRACLTSGTPGPRAQLTAVTPDT 90 

. I . : . I : . . : . : . . I I I | . . . . | . : . . 

2 KKTTLALS RLALSLGLALS PLS AT AAETS S ATTAQQMPSLA PMLE KVMPS 51 

91 RTREASENSGTRSRAWLAVALGAGGAVLLLLWGGGRGPPAVLAAVPSPPP 140 

..-I.I.I.::::.:.. : : I : : ..:...((. 

52 VVS INVEGSTTVNT PRMPRN FQQ FFGDD . . . . SPFCQEGSPFQ 90 

141 ASPRSQYN FIADVVEKTAPAVVYIEILDRHPFLGREVPISNGSGFVVAAD 190 
- I I : I ::..:.:.. : : . : I I I . : : . I I 

91 SSPFCQ. . . GGQGGNGGGQQQKFMAL GSGVIIDAD 122 

191 .GLIVTNAHVVADRRRVRVRLLSGDTYEAVVTAVDPVADIATLRIQTKEP 23 9 

I - : II I - I I I . : : : I . I -I . : : 1 : . : | | .III : . M . . . 
123 KGYVVTNNHVVDNATVIKVQLSDGRKFDAKMVGKDPRSDIALIQIQNPKN 172 

240 LPTLPLGRSADVRQGEFVVAMGSPFALQNTITSGIVSSAQRPARDLGLPQ 289 

I . . : . : : I - . : I I : : . I : : I . I 1 : I . : I : I I I I II . . I . II . 

173 LTAIKMADSDALRVGDYTVGIGNPFGLGETVTSGI VSALGRS . . . . GLNA 218 

290 TNVE. YIQTDAAIDFGNSGGPLVNLDGEVIGVNTMKVTA GISFAI 333 

.1 I : I I I I I I I : I f I I I : II I I : I I : I I : I I : . : M : I I I 

219 ENYENFIQTDAAINRGNSGGALVNLNGELIGINTAILAPDGGNIGIGFAI 2 68 

334 PSDRLREF LHRGE 34 6 

II:.:::: : . I I I 

269 PSNMVKNLTSQMVEYGQVKRGELGIMGTELNSELAKAMKVDAQRGAFVSQ 318 

347 . KKNSSSGISG SQRRYIGVM. . . . MLTL . . . 369 

• Ml-: ■ I . | : | . | .Ml 

319 VL PNSSAAKAGI KAGDV I T SLNGKPI SS FAALRAQVGTMPVG S KLTLGLL 368 

370 SPSILAELQLREPSFPDVQHGVLIHfCVIL 398 

I • I I : . : : : I I . : : : I I : : : - I 
369 RDGKQVNVNLELQQSSQNQVDSSSIFNGIEGAEMSNKGKDQGVWNNVKT 418 

399 GSPAHRAGLRPGDVILAIGEQMVQNAEDVYEAVRTQ . SQLAVQIRRGRET 447 

I - II - M : . I N I : : - : I I . I . : : . . : . . I f I : . | . | | 
419 GTPAAQIGLKKGDVIIGANQQAVKNIAELRKVLDSKPSVLALNIQRGDRH 4 68 

448 LTLYVTPEVTE 458 

U : . . . : . . 
4 69 LPVNAVISLNP 479 



FIGURE 1 



55 



EP 0 828 003 A2 



1 

PS P 1 - 2 CGTGG AT CCC 
PSP1-1 CGTGGATCCC 
PSP1-3 CGTGGATCCC 
PSP1-4 CGTGGATCCC 

51 

PS PI- 2 GCGTAGAGCA 
PSP1-1 GCGTAGAGCA 
PS PI- 3 GCGTAGAGCA 
PS PI -4 GCGTAGAGCA 

101 

PSP1-2 GGCGAGAGGC 
PSP1-1 GGCGAGAGGC 
PS PI- 3 GGCGAGAGGC 
PS PI -4 GGCGAGAGGC 

151 

PSP1-2 GGTCTCTTCC 
PSPl-1 GGTCTCTTCC 
PSP1-3 GGTCTCTTCC 
PSP1-4 GGTCTCTTCC 

201 

PS PI -2 GCCCCGGCCC 
PSPl-1 GCCCCGGCCC 
PS PI -3 GCCCCGGCCC 
PS PI- 4 GCCCCGGCCC 

251 

PSP1-2 CGTCGCCGCC 
PSPl-1 CGTCGCCGCC 
PSP1-3 CGTCGCCGCC 
PSP1-4 CGTCGCCGCC 



GAGAAAGAGG CGCAGGACGA 
GAGAAAGAGG CGCAGGACGA 
GAGAAAGAGG CGCAGGACGA 
GAGAAAGAGG CGCAGGACGA 



GCAGCACGAG CAGTAGGAAG 
GCAGCACGAG CAGTAGGAAG 
GCAGCACGAG CAGTAGGAAG 
GCAGCACGAG CAGTAGGAAG 



GAAGTGGTCA GGCGCCGAAG 
GAAGTGGTCA GGCGCCGAAG 
GAAGTGGTCA' GGCGCCGAAG 
GAAGTGGTCA GGCGCCGAAG 



CGCCGGGTCT CTTACCGGTG 
CGCCGGGTCT CTTACCGGTG 
CGCCGGGTCT CTTACCGGTG 
CGCCGGGTCT CTTACCGGTG 



TGAGGGAAGC TCCATAACTG 

TGAGGGAAGC TCCATAACTG 

TGAGGGAAGC TCCATAACTG 

TGAGGGAAGC TCCATAACTG 



GCCGCCATTT TCGCGCCCGG 
GCCGCCATTT TCGCGCCCGG 
GCCGCCATTT TCGCGCCCGG 
GCCGCCATTT TCGCGCCCGG 

FIGURE 2A 



50 

GGAGGCAGAA CCCGACTGGC 
GGAGGCAGAA CCCGACTGGC 
GGAGGCAGAA CCCGACTGGC 
GGAGGCAGAA CCCGACTGGC 

100 

CAGTCACCCG GAAGCCTGGG 
CAGTCACCCG GAAGCCTGGG 
CAGTCACCCG GAAGCCTGGG 
CAGTCACCCG GAAGCCTGGG 

150 

GCCGAGAGCA CGCGGGGATC 
GCCGAGAGCA CGCGGGGATC 
GCCGAGAGCA CGCGGGGATC 
GCCGAGAGCA CGCGGGGATC 

200 

CGAGTCAAAG AGCCGCTCCG 
CGAGTCAAAG AGCCGCTCCG 
CGAGTCAAAG AGCCGCTCCG 
CGAGTCAAAG AGCCGCTCCG 

250 

CTGCTTCAGG AGCGCCCGGC 
CTGCTTCAGG AGCGCCCGGC 
CTGCTTCAGG AGCGCCCGGC 
CTGCTTCAGG AGCGCCCGGC 

300 

CCGCAGGGGC TCTTGGGAAG 
CCGCAGGGGC TCTTGGGAAG 
CCGCAGGGGC TCTTGGGAAG 
CCGCAGGGGC TCTTGGGAAG 
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301 

PS PI -2 GCGGAGTCTT TGGGCATCCG 
PSP1-1 GCGGAGTCTT TGGGCATCCG 
PSP1-3 GCGGAGTCTT TGGGCATCCG 
PSP1-4 GCGGAGTCTT TGGGCATCCG 

351 

PSPl-2 CGCGCCGGAA GGGCTAGCGG 
PSP1-1 CGCGCCGGAA GGGCTAGCGG 
PSP1-3 CGCGCCGGAA GGGCTAGCGG 
PS PI -4 CGCGCCGGAA GGGCTAGCGG 

401 

PS PI -2 TCTCACAACT CGCGTCCGGC 
PSP1-1 TCTCACAACT CGCGTCCGGC 
PSP 1 - 3 TCTCACAACT- CGCGTCCGGC 
PSP1-4 TCTCACAACT CGCGTCCGGC 

451 

PSP1-2 AGGGAGGAGT CGGCCTCCCG 
PSP1-1 AGGGAGGAGT CGGCCTCCCG 
PS PI -3 AGGGAGGAGT CGGCCTCCCG 
PS PI -4 AGGGAGGAGT CGGCCTCCCG 



350 

CCCGGGGTGA GGGGACCCGA AGTCCTGAGG 
CCCGGGGTGA GGGGACCCGA AGTCCTGAGG 
CCCGGGGTGA GGGGACCCGA AGTCCTGAGG 
CCCGGGGTGA GGGGACCCGA AGTCCTGAGG 

400 

TCCCAGCATA CCCCGCGGCC CCTTGGGCCG 
TCCCAGCATA CCCCGCGGCC CCTTGGGCCG 
TCCCAGCATA CCCCGCGGCC CCTTGGGCCG 
TCCCAGCATA CCCCGCGGCC CCTTGGGCCG 

450 

G G AG ACCACA ATTCCCGGCA TTCGTGGGGC 
GGAGACCACA ATTCCCGGCA TTCGTGGGGC 
GGAGACCACA ATTCCCGGCA TTCGTGGGGC 
GGAGACCACA ATTCCCGGCA TTCGTGGGGC 

500 

GAATCCTGGT CCCGGCGTGC ACTTCTGAAG 
GAATCCTGGT CCCGGCGTGC ACTTCTGAAG 
GAATCCTGGT CCCGGCGTGC ACTTCTGAAG 
GAATCCTGGT CCCGGCGTGC ACTTCTGAAG 



501 550 
PSP1-2 GACTTCAGGT ACCGGCGTGC CCCGCGTCCT ACTGTCCGCC TGCTCGCGTC 
PSP1-1 GACTTCAGGT ACCGGCGTGC CCCGCGTCCT ACTGTCCGCC TGCTCGCGTC 
PS PI -3 GACTTCAGGT ACCGGCGTGC CCCGCGTCCT ACTGTCCGCC TGCTCGCGTC 
PSP1-4 GACTTCAGGT ACCGGCGTGC CCCGCGTCCT ACTGTCCGCC TGCTCGCGTC 

551 600 
PSP1-2 CTGGGTGCCG CCTCTGAGTA GGGCGGGCGA GGAGGCAGCC AAGGCGGAGC 
PSP1-1 CTGGGTGCCG CCTCTGAGTA GGGCGGGCGA GGAGGCAGCC AAGGCGGAGC 
PSP1-3 CTGGGTGCCG CCTCTGAGTA GGGCGGGCGA GGAGGCAGCC AAGGCGGAGC 
PSP1-4 CTGGGTGCCG CCTCTGAGTA GGGCGGGCGA GGAGGCAGCC AAGGCGGAGC 



FIGURE 2B 
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601 

PSP1-2 TGATGGCTGC GCCGAGGGCG 
PSP1-1 TGATGGCTGC GCCGAGGGCG 
PSP1-3 TGATGGCTGC GCCGAGGGCG 
PS PI -4 TGATGGCTGC GCCGAGGGCG 



650 

GGGCGGGGTG CAGGCTGGAG CCTTCGGGCA 
GGGCGGGGTG CAGGCTGGAG CCTTCGGGCA 
GGGCGGGGTG CAGGCTGGAG CCTTCGGGCA 
GGGCGGGGTG CAGGCTGGAG CCTTCGGGCA 



651 

PS PI -2 TGGCGGGCTT TGGGGGGCAT 
PSP1-1 TGGCGGGCTT TGGGGGGCAT 
PS PI -3 TGGCGGGCTT TGGGGGGCAT 
PS PI -4 TGGCGGGCTT TGGGGGGCAT 

701 

PSP1-2 TGACCTCCGG GCCCTGCTGA 

PSP1-1 TGACCTCCGG GCCCTGCTGA 

PSP1-3 TGACCTCCGG GCCCTGCTGA 

PSP1-4 TGACCTCCGG GCCCTGCTGA 

751 

PSP1-2 TGACTTATGG GACCCCCAGT 
PSP1-1 TGACTTATGG GACCCCCAGT 
PSP1-3 TGACTTATGG GACCCCCAGT 
PSP1-4 TGACTTATGG GACCCCCAGT 

801 

PSP1-2 GAACCCCGAG CATGCCTGAC 
PSP1-1 GAACCCCGAG CATGCCTGAC 
PSP1-3 GAACCCCGAG CATGCCTGAC 
PS PI -A GAACCCCGAG CATGCCTGAC 



700 

TTGCTGGGGG AGGAGACCCC GTTTGACCCC 
TCGCTGGGGG AGGAGACCCC GTTTGACCCC 
TCGCTGGGGG AGGAGACCCC GTTTGACCCC 
TCGCTGGGGG AGGAGACCCC GTTTGACCCC 

750 

CGTCAGGAAC TTCTGACCCC CGGGCCCGAG 
CGTCAGGAAC TTCTGACCCC CGGGCCCGAG 
CGTCAGGAAC TTCTGACCCC CGGGCCCGAG 
CGTCAGGAAC TTCTGACCCC CGGGCCCGAG 

800 

CTCTGGGCCC GGTTGTCTGT TGGGGTCACT 
CTCTGGGCCC GGTTGTCTGT TGGGGTCACT 
CTCTGGGCCC GGTTGTCTGT TGGGGTCACT 
CTCTGGGCCC GGTTGTCTGT TGGGGTCACT 

850 

GTCTGGGACC CCGGGTCCCC GGGCACAACT 
GTCTGGGACC CCGGGTCCCC GGGCACAACT 
GTCTGGGACC CCGGGTCCCC GGGCACAACT 
GTCTGGGACC CCGGGTCCCC GGGCACAACT 



851 900 
PSP1-2 GACTGCGGTG ACCCCAGATA CCAGGACCCG GGAGGCCTCA GAGAACTCTG 
PSP1-1 GACTGCGGTG ACCCCAGATA CCAGGACCCG GGAGGCCTCA GAGAACTCTG 
PSP1-3 GACTGCGGTG ACCCCAGATA CCAGGACCCG GGAGGCCTCA GAGAACTCTG 
PSP1-4 GACTGCGGTG ACCCCAGATA CCAGGACCCG GGAGGCCTCA GAGAACTCTG 

FIGURE -2 C 
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901 

PSPl-2 GAACCCGTTC GCGCGCGTGG 
PSP1-1 GAACCCGTTC GCGCGCGTGG 
PSP1-3 GAACCCGTTC GCGCGCGTGG 
PSPl-4 GAACCCGTTC GCGCGCGTGG 

951 

PSPl-2 GTGCTGTTGT TGTTGTGGGG 
PSP1-1 GTGCTGTTGT TGTTGTGGGG 
PSP1-3 GTGCTGTTGT TGTTGTGGGG 
PSPl-4 GTGCTGTTGT TGTTGTGGGG 

1001 

PSP1-2 CGCCGTCCCT AGCCCGCCGC 
PSP1-1 CGCCGTCCCT AGCCCGCCGC 
PS PI -3 CGCCGTCCCT AGCCCGCCGC 
PS PI -4 CGCCGTCCCT AGCCCGCCGC 



950 

CTGGCGGTGG CGCTGGGCGC TGGGGGGGCA 
CTGGCGGTGG CGCTGGGCGC TGGGGGGGCA 
CTGGCGGTGG CGCTGGGCGC TGGGGGGGCA 
CTGGCGGTGG CGCTGGGCGC TGGGGGGGCA 

1000 

CGGGGGTCGG GGTCCTCCGG CCGTCCTCGC 
CGGGGGTCGG GGTCCTCCGG CCGTCCTCGC 
CGGGGGTCGG GGTCCTCCGG CCGTCCTCGC 
CGGGGGTCGG GGTCCTCCGG CCGTCCTCGC 

1050 

CCGCTTCTCC CCGGAGTCAG TACAACTTCA 
CCGCTTCTCC CCGGAGTCAG TACAACTTCA 
CCGCTTCTCC CCGGAGTCAG TACAACTTCA 
CCGCTTCTCC CCGGAGTCAG TACAACTTCA 



1051 1100 
PSP1-2 TCGCAGATGT GGTGGAGAAG ACAGCACCTG CCGTGGTCTA TATCGAGATC 
PSP1-1 TCGCAGATGT GGTGGAGAAG ACAGCACCTG CCGTGGTCTA TATCGAGATC 
PSP1-3 TCGCAGATGT GGTGGAGAAG ACAGCACCTG CCGTGGTCTA TATCGAGATC 
PS PI -4 TCGCAGATGT GGTGGAGAAG ACAGCACCTG CCGTGGTCTA TATCGAGATC 

1101 H50 

PS PI -2 CTGGACCGGC ACCCTTTCTT GGGCCGCGAG GTCCCTATCT CGAACGGCTC 

PSP1-1 CTGGACCGGC ACCCTTTCTT GGGCCGCGAG GTCCCTATCT CGAACGGCTC 

PSP1-3 CTGGACCGGC ACCCTTTCTT GGGCCGCGAG GTCCCTATCT CGAACGGCTC 

PSP1-4 CTGGACCGGC ACCCTTTCTT GGGCCGCGAG GTCCCTATCT CGAACGGCTC 

1151 1200 
PSPl-2 AGGATTCGTG GTGGCTGCCG ATGGGCTCAT TGTCACCAAC GCCCATGTGG 
PSP1-1 AGGATTCGTG GTGGCTGCCG ATGGGCTCAT TGTCACCAAC GCCCATGTGG 
PS PI -3 AGGATTCGTG GTGGCTGCCG ATGGGCTCAT TGTCACCAAC GCCCATGTGG 
PS PI -.4 AGGATTCGTG GTGGCTGCCG ATGGGCTCAT TGTCACCAAC GCCCATGTGG 
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1201 

PSP1-2 TGGCTGATCG GCGCAGAGTC 
PSP1-1 TGGCTGATCG GCGCAGAGTC 
PSP1-3 TGGCTGATCG GCGCAGAGTC 
PSPl-4 TGGCTGATCG GCGCAGAGTC 

1251 

PSP1-2 GAGGCCGTGG TCACAGCTGT 
PSP1-1 GAGGCCGTGG TCACAGCTGT 
PSP1-3 GAGGCCGTGG TCACAGCTGT 
PS PI- 4 GAGGCCGTGG TCACAGCTGT 

1301 

PSP1-2 GATTCAGACT AAGGAGCCTC 
PSP1-1 GATTCAGACT AAGGAGCCTC 
PSP1-3 GATTCAGACT AAGGAGCCTC 
PSP1-4 GATTCAGACT AAGGAGCCTC 

1351 

PS PI -2 ATGTCCGGCA AGGGGAGTTT 
PSP1-1 ATGTCCGGCA AGGGGAGTTT 
PSP1-3 ATGTCCGGCA AGGGGAGTTT 
PSP1-4 ATGTCCGGCA AGGGGAGTTT 



1250 

CGTGTGAGAC TGCTAAGCGG CGACACGTAT 
CGTGTGAGAC TGCTAAGCGG CGACACGTAT 
CGTGTGAGAC TGCTAAGCGG CGACACGTAT 
CGTGTGAGAC TGCTAAGCGG CGACACGTAT 

1300 

GGATCCCGTG GCAGACATCG CAACGCTGAG 
GGATCCCGTG GCAGACATCG CAACGCTGAG 
GGATCCCGTG GCAGACATCG CAACGCTGAG 
GGATCCCGTG GCAGACATCG CAACGCTGAG 

1350 

TCCCCACGCT GCCTCTGGGA CGCTCAGCTG 
TCCCCACGCT GCCTCTGGGA CGCTCAGCTG 
TCCCCACGCT GCCTCTGGGA CGCTCAGCTG 
TCCCCACGCT GCCTCTGGGA CGCTCAGCTG 

1400 

GTTGTTGCCA TGGGAAGTCC CTTTGCACTG 
GTTGTTGCCA TGGGAAGTCC CTTTGCACTG 
GTTGTTGCCA TGGGAAGTCC CTTTGCACTG 
GTTGTTGCCA TGGGAAGTCC CTTTGCACTG 



1401 

PSP1-2 CAGAACACGA TCACATCCGG 
PSP1-1 CAGAACACGA TCACATCCGG 
PS PI -3 CAGAACACGA TCACATCCGG 
PSP1-4 CAGAACACGA TCACATCCGG 

1451 

PSP1-2 AGACCTGGGA CTCCCCCAAA 
PSPl-1 AGACCTGGGA CTCCCCCAAA 
PSP1-3 AGACCTGGGA CTCCCCCAAA 
PSP1-4 AGACCTGGGA CTCCCCCAAA 



1450 

CATTGTTAGC TCTGCTCAGC GTCCAGCCAG 
CATTGTTAGC TCTGCTCAGC GTCCAGCCAG 
CATTGTTAGC TCTGCTCAGC GTCCAGCCAG 
CATTGTTAGC TCTGCTCAGC GTCCAGCCAG 

1500 

CCAATGTGGA ATACATTCAA ACT GAT GC AG 
CCAATGTGGA ATACATTCAA ACTGATGCAG 
CCAATGTGGA ATACATTCAA ACTGATGCAG 
CCAATGTGGA ATACATTCAA ACTGATGCAG 
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1501 1550 



PSP1-2 
PSP1-1 
PSP1-3 
PSP1-4 


CTATTGATTT 
CTATTGATTT 
CTATTGATTT 
CTATTGATTT 


TGGAAACTCT 
TGGAAACTCT 
TGGAAACTCT 
TGGAAACTCT 


GGAGGTCCCC 
GGAGGTCCCC 
GGAGGTCCCC 
GGAGGTCCCC 


TGGTTAACCT 
TGGTTAACCT 
TGGTTAACCT 
TGGTTAACCT 


GGTGAGTGAG 


PSP1-2 


1551 








1600 


PSP1-1 












PSP1-3 
PSP1-4 


ACATCCTTCC 


TTCCAAGAAT 


CCCTGCCCCA 


GGTCAGTGTG 


GGAAGGGTAG 


PSP1-2 


1601 








1650 


PSP1-1 












PSP1-3 
PSP1-4 


GTTTCCCCTA 


ATTCAAGGAT 


GTTTGGTCAA 


GTTTCTGAGC 


AGTTCTTTGT 



1651 1700 
PSP1-2 

pspi-i ; 

PSPl-3 TGGCTATCTC TCAATATCCA ACCAGATCTC CCCAACACTT GCTGGTACTT 
PSP1-4 

1701 1750 

PSP1-2 

PSP1-1 

PSPl-3 TTGTTCGGGT GCCCCCATCC CCTACTATTT GTT TAG GCTA GGGAACTGGG 
PSP1-4 GGCTA GGGAACTGGG 

1751 1800 

PSP1-2 GGATG GGGAGGTGAT TGGAGTGAAC ACCATGAAGG 

PSP1-1 GGATG GGGAGGTGAT TGGAGTGAAC ACCATGAAGG 

PSPl-3 GGCTGTATCC CTGCAGGATG GGGAGGTGAT TGGAGTGAAC ACCATGAAGG 
PS PI -4 GGCTGTATCC CTGCAGGATG GGGAGGTGAT TGGAGTGAAC ACCATGAAGG 
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1801 1850 
PSP1-2 TCACAGCTGG AATCTCCTTT GCCATCCCTT CTGATCGTCT TCGAGAGTTT 
PSP1-1 TCACAGCTGG AATCTCCTTT GCCATCCCTT CTGATCGTCT TCGAGAGTTT 
PSP1-3 TCACAGCTGG AATCTCCTTT GCCATCCCTT CTGATCGTCT TCGAGAGTTT 
PSP1-4 TCACAGCTGG AATCTCCTTT GCCATCCCTT CTGATCGTCT TCGAGAGTTT 

1851 190 0 
PSP1-2 CTGCATCGTG GGGAAAAGAA GAATTCCTCC TCCGGAATCA GTGGGTCCCA 
PSP1-1 CTGCATCGTG GGGAAAAGAA GAATTCCTCC TCCGGAATCA GTGGGTCCCA 
PSP1-3 CTGCATCGTG GGGAAAAGAA GAATTCCTCC TCCGGAATCA GTGGGTCCCA 
PSP1-4 CTGCATCGTG GGGAAAAGAA GAATTCCTCC TCCGGAATCA GTGGGTCCCA 

1901 1950 
PSP1-2 GCGGCGCTAC ATTGGGGTGA TGATGCTGAC CCTGAGTCCC AGCATCCTTG 
PSP1-1 GCGGCGCTAC ATTGGGGTGA TGATGCTGAC CCTGAGTCCC AGCATCCTTG 
PSP1-3 GCGGCGCTAC ATTGGGGTGA TGATGCTGAC CCTGAGTCCC AGCATCCTTG 
PSP1-4 GCGGCGCTAC ATTGGGGTGA TGATGCTGAC CCTGAGTCCC A 

1951 2000 
PSP1-2 CTGAACTACA GCTTCGAGAA CCAAGCTTTC CCGATGTTCA GCATGGTGTA 
PSP1-1 CTGAACTACA GCTTCGAGAA CCAAGCTTTC CCGATGTTCA GCATGGTGTA 
PSP1-3 CTGAACTACA GCTTCGAGAA CCAAGCTTTC CCGATGTTCA GCATGGTGTA 
PSP1-4 

2001 2050 
PSP1-2 CTCATCCATA AAGTCATCCT GGGCTCCCCT GCACACCGGG CTGGTCTGCG 
PSP1-1 CTCATCCATA AAGTCATCCT GGGCTCCCCT GCACACCGGG CTGGTCTGCG 
PSP1-3 CTCATCCATA AAGTCATCCT GGGCTCCCCT GCACACCGGG CTGGTCTGCG 
PSP1-4 GGG CTGGTCTGCG 

2051 2100 
PSP1-2 GCCTGGTGAT GTGATTTTGG CCATTGGGGA GCAGATGGTA CAAAATGCTG 
PSP1-1 GCCTGGTGAT GTGATTTTGG CCATTGGGGA GCAGATGGTA CAAAATGCTG 
PSP1-3 GCCTGGTGAT GTGATTTTGG CCATTGGGGA GCAGATGGTA CAAAATGCTG 
PSP1-4 GCCTGGTGAT GTGATTTTGG CCATTGGGGA GCAGATGGTA CAAAATGCTG 
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2101 

PS PI -2 AAGATGTTTA TGAAGCTGTT 
PSP1-1 AAGATGTTTA TGAAGCTGTT 
PS PI -3 AAGATGTTTA TGAAGCTGTT 
PS PI -4 AAGATGTTTA TGAAGCTGTT 



2150 

CGAACCCAAT CCCAGTTGGC AGTGCAGATC 
CGAACCCAAT CCCAGTTGGC AGTGCAGATC 
CGAACCCAAT CCCAGTTGGC AGTGCAGATC 
CGAACCCAAT CCCAGTTGGC AGTGCAGATC 



2151 

PS PI -2 CGGCGGGGAC GAGAAACACT 
PSP1-1 CGGCGGGGAC GAGAAACACT 
PSP1-3 CGGCGGGGAC GAGAAACACT 
PSP1-4 CGGCGGGGAC GAGAAACACT 



2200 

GACCTTATAT GTGACCCCTG AGGTCACAGA 
GACCTTATAT GTGACCCCTG AGGTCACAGA 
GACCTTATAT GTGACCCCTG AGGTCACAGA 
GACCTTATAT GTGACCCCTG AGGTCACAGA 



2201 

PS PI -2 ATGAATAGAT CACCAAGAGT 
PSP1-1 ATGAATAGAT CACCAAGAGT 
PS PI -3 ATGAATAGAT CACCAAGAGT 
PSP1-4 ATGAATAGAT CACCAAGAGT 



2250 

ATGAGGCTCC TGCTCTGATT TCCTCCTTGC 
ATGAGGCTCC TGCTCTGATT TCCTCCTTGC 
ATGAGGCTCC TGCTCTGATT TCCTCCTTGC 
ATGAGGCTCC TGCTCTGATT TCCTCCTTGC 



2251 

PS PI -2 CTTTCTGGCT GAGGTTCTGA 
PSP1-1 CTTTCTGGCT GAGGTTCTGA 
PS PI -3 CTTTCTGGCT GAGGTTCTGA 
PSP1-4 CTTTCTGGCT GAGGTTCTGA 

2301 

PSP1-2 GTGGGGGCAG GTCCCTCCAA 
PSP1-1 GTGGGGGCAG GTCCCTCCAA 
PSP1-3 GTGGGGGCAG GTCCCTCCAA 
PS PI -4 GTGGGGGCAG GTCCCTCCAA 



2300 

GGGCACCGAG ACAGAGGGTT AAATGAACCA 
GGGCACCGAG ACAGAGGGTT AAATGAACCA 
GGGCACCGAG ACAGAGGGTT AAATGAACCA 
GGGCACCGAG ACAGAGGGTT AAATGAACCA 

2350 

CCACCAGCAC TGACTCCTGG GCTCTGAAGA 
CCACCAGCAC TGACTCCTGG GCTCTGAAGA 
CCACCAGCAC TGACTCCTGG GCTCTGAAGA 
CCACCAGCAC TGACTCCTGG GCTCTGAAGA 



2351 

PSP1-2 AT C ACAG AAA CACTTTTTAT 
PSP1-1 ATCACAGAAA CACTTTTTAT 
PSP1-3 ATCACAGAAA CACTTTTTAT 
PSP1-4 ATCACAGAAA CACTTTTTAT 



2400 

ATAAAATAAA ATTATACCTA GCAACATAAA 
ATAAAATAAA ATTATACCTA GCAACATAAA 
ATAAAATAAA ATTATACCTA GCAACATATT 
ATAAAATAAA ATTATACCTA GCAAAAAAAA 
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2401 2450 







TV TV 








PSP1-1 


AAAAAAAAAA 


AA 








PSP1-3 
PSP1-4 


ATAGTAAAAA 
AAAAAAAAAA 


ATGAGGTGGG 
AAAAAAAAAA 


AGGGCTGGAT 
AAAAAAAAAA 


CTTTTCCCCC 


ACCAAAAGGC 


PSP1-2 


2451 








2500 


PSP1-1 












PSP1-3 
PSP1-4 


TAGAGGTAAA 


GCTGTATCCC 


CCTAAACTTA 


GGGGAGATAC 


TGGAGCTGAC 


PSP1-2 


2501 








2550 


PSP1-1 












PSP1-3 
PSP1-4 


CATCCTGACC 


TCCTATTAAA 


GAAAATGAGC 


TGCTGAAAAA 


AAAAAAAAAA 



2551 
PSP1-2 . 
PSP1-1 
PSP1-3 A 
PSP1-4 
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1 MAAP RAGRGAGWS LRAWRALGG I RWGRR P RLT PDLRALLT SGT S 4 4 

MM I I I I : i . : ||. : . . M M : .... 
16 LAAPASAQL5RAGRSAPL AAGCPDRCEPARCPPQ PEHCE 54 

45 DPRARVTYGTPSLWARLSVGVTEPRACLTSGTPGPRAQLTAV TP 88 

: M I I . M . . . : M . M . M . . I . : 

55 GGRARDACGCCRV CGAPEGAACGLQEGPCGEGLQCVVPFGVPASA 99 

89 DTRTREASENSGTRSRAWLAVALGAGGAVLLLLWGGG RGPPAV 131 

. M I I | | I : : : : I M . I 

100 TVRRRAQAGLCVCASSEPVCGSDANTYANLCQLRAASRRSERLHRPPVIV 14 9 

132 LAAVPSPP PASPRSQYNFIADVVEKTAPAVVYIEILDRHPFLGREV 177 

I • I • I I M I I I I I I I I I M M I I M I : : : II I If 

150 LQRGACGQGQEDPNSLRHKYNFIADVVEKIAPAVVHIELFRKLPFSKREV 199 

178 PISNGSGFVVAADGLIVTNAHVVADRRRVRVRLLSGDTYEAVVTAVDPVA 227 

I = . M I I I M . M I M I I I I I I I .-: : : I I : I I M M I I I : . M I - I 
200 PVASGSGFIVSEDGLIVTNAHVVTNKHRVKVELKNGATYEAKIKDVDEKA 249 

228 D IATLR I QTKEPLPTLPLGRSADVRQGEFVVAMGSPFALQNT I TSG I VSS 277 

Ml : M : .:. I I . I M I I . : : I . I I I I I I : I I I I . I I I I : I . I I | I . 
250 DIALIKIDHQGKLPVLLLGRSSELRPGEFVVAIGSPFSLQNTVTTGIVST 299 

278 AQRPARDLGLPQTNVEYIQTDAAIDFGNSGGPLVNLDGEVIGVNTMKVTA 327 

M I . : : M I I . . . : : : I I I I I I I :: I I I I I M I I I I M I I I : I I : I I I I 
300 TQRGGKELGLRNSDMDYIQTDAI INYGNSGGPLVNLDGEV1GINTLKVTA 34 9 

328 GISFAIPSDRLREFLHRGEKKNSSSGISGSQRRYIGVMMLTLSPSILAEL 377 

M I I I I I I I : : : M I : . . : . . . I . . . : : I I I : M : . | . . | || 
350 GISFAI PSDKIKKFLTESHDRQ . AKGKAITKKKYIGIRMMSLTSSKAKEL 398 

378 QLREPSFPDVQHGVLIHKVILGSPAHRAGLRPGDVILAIGEQMVQNAEDV 427 

■ I ... I I I I I . . I Ml : M I . M I : . M I I : M - M I M M I 
399 KDRHRDFPDVISGAYIIEVIPDTPAEAGGLKENDVI ISINGQSVVSANDV 44 8 

4 28 YEAVRTQSQLAVQIRRGRETLTLYVTPEVTE 4 58 

: . : : M I . : : I I I M - : : I M I - : 
449 SDVI KRESTLNMWRRGNEDIMITVT PEEID 479 
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